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Abstract 

A method  is  given  for  solving  Ax  = b and 
mini Ax  - bH.,  where  the  matrix  A is  large  and  sparse. 

The  method  is  based  on  the  bidiagonalization  procedure  of 
Golub  and  Kahan.  It  is  analytically  equivalent  to  the 
method  of  conjugate  gradients  (CG)  but  possesses  more  favor- 
able numerical  properties.  The  Fortran  implementation  of 
the  method  (subroutine  LSQR)  incorporates  reliable  stopping 
criteria  and  provides  estimates  of  various  quantities  includ- 
ing standard  errors  for  x and  the  condition  number  of  A. 
Numerical  tests  are  described  comparing  LSQR  with  several 
other  CG  algorithms.  Further  results  for  a large  practical 
problem  illustrate  the  effect  of  pre-conditioning  least- 
squares  problems  using  a sparse  LU  factorization  of  A. 
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1.  INTRODUCTION 

A numerical  method  is  presented  here  for  computing  a solution  x to 
either  of  the  following  problems: 

Unsymnetric  equations:  solve  Ax  = b 

Linear  least  squares:  minimize  ||Ax  - b||2 

where  A is  a real  matrix  of  dimensions  m by  n and  b is  a real  vector.  It 
will  usually  be  true  that  min  and  rank(A)  = n,  but  these  conditions  are 
not  essential.  The  method,  to  be  called  algorithm  LSQR,  is  similar  in 
style  to  the  well-known  method  of  conjugate  gradients  ( CG ) as  applied  to 
the  least-squares  problem  (Hestenes  and  Stiefel  I 111) . The  matrix  A is 
used  only  to  compute  products  of  the  form  Av  and  ATm  for  various  vectors  v 
and  u.  Hence  A will  normally  be  large  and  sparse  or  will  be  expressible  > 

as  a product  of  matrices  that  are  sparse  or  have  special  structure.  A 
typical  application  is  to  the  large  least-squares  problems  arising  from 
analysis  of  variance. 

CG-likc  methods  are  iterative  in  nature.  They  are  characterized  by 
their  need  for  only  a few  vectors  of  working  storage  and  by  their 
theoretical  convergence  within  at  most  n iterations  (if  exact  arithmetic 
could  be  performed) . In  practice  such  methods  may  require  far  fewer  or 
far  more  than  >.  iterations  to  reach  an  acceptable  approximation  to  x.  The 
methods  are  most  useful  when  A is  well-conditioned  and  has  many  nearly 
equal  singular  values.  These  properties  occur  naturally  in  many  applica- 
tions. In  other  cases  it  is  often  possible  to  divide  the  solution 
procedure  into  a direct  and  an  iterative  part,  such  that  the  iterative 
part  has  a better  conditioned  matrix  for  which  CG- like  methods  will 
converge  more  quickly.  Some  such  transformation  methods  are  considered 
here. 
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Algorithm  LSQR  is  based  on  the  bidiagonalization  procedure  of  Golub 
and  Kahan  [101.  It  generates  a sequence  of  approximations  {x-^}  such  that 
the  residual  norm  ||^||2  decreases  monotonically,  where  = b - Ax 
Analytically  the  sequence  {x^}  is  identical  to  the  sequence  generated  by 
the  standard  CG  algorithm  and  by  several  other  published  algorithms.  How- 
ever, LSQR  is  shown  by  example  to  be  numerically  more  reliable  in  various 
circumstances  than  the  other  methods  considered. 

The  Fortran  implementation  of  LSQR  is  designed  for  practical  applica- 
tion. It  incorporates  reliable  stopping  criteria  and  provides  the  user 
with  computed  estimates  of  the  following  quantities:  x,  r = h - Ax,  A r, 

II HI 2*  IMI^»  standard  errors  for  x,  and  the  condition  number  of  A. 

1.1  Notation 

Matrices  will  be  denoted  by  A,  3,  . ..,  vectors  by  v , w , . ..,  and 
scalars  by  a,  3,  ...  . An  exception  is  a and  s which  will  denote  the 
significant  components  of  an  elementary  orthogonal  matrix,  such  that 
a +8  = 1.  For  a vector  y,  j|y||  will  always  denote  the  Euclidean  norm 

||y||2  = (vTv)3? . For  a matrix  A , ||a||  will  usually  mean  the  Frobenius  norm, 
||/l||f  = fEaf  J , and  the  condition  number  for  an  unsymnetric  matrix  A is 
defined  by  cond(VU  = |U||  ||i4*]|  where  A+  denotes  the  pseudo- inverse  of  A . 

The  relative  precision  of  floating-point  arithmetic  will  be  c,  the 
smallest  machine- representable  number  such  that  1 + e > 7. 

I 

I ♦ 
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2.  MOTIVATION  VIA  '110-:  LANCZOS  PROCESS 

In  this  section  we  review  the  symmetric  Lanczos  process  [13]  and  its 
use  in  solving  symmetric  linear  equations  Mx  = b.  Algorithm  LSQR  is  then 
derived  by  applying  the  Lanczos  process  to  a particular  symmetric  system. 
Although  a more  direct  development  is  given  in  section  4,  the  present 
derivation  may  remain  useful  for  a future  error  analysis  of  LSQR,  since 
many  of  the  rounding  error  properties  of  the  Lanczos  process  are  already 
known  (Paige  [19]). 

Given  a symmetric  matrix  M and  a starting  vector  b,  the  Lanczos  process 
is  a method  for  generating  a sequence  of  vectors  [v .}  and  scalars  {a.},  (6.) 

t t t 

such  that  M is  reduced  to  tridiagonal  form.  A reliable  computational  form 
of  the  method  is  as  follows:  - 


The  Lanczos  process  (Reduction  to  tridiagonal  form) 

(a) 

(b) 


Vi  = h- 


w.  = Mv.  - 8.u.  , 

t t t t-i 


a.  = v.  w. 

t t t 


8 . y.  . * u.  - a. y. 

t+i  t+l  t tt 


t =7,2,.. 


(2.1) 


where  v e 0 and  each  8^  s 0 is  chosen  so  that  j|it.||  =7  (t  > 0) . Prior  to 
termination  at  the  first  zero  8^+1,  the  situation  after  k steps  is  summarized 
by 


MV, 


VkTk  * 6k+A+iek 


(2.2) 


where  T = tridiag(8if  ou,  8i+1)  and  V ^ = [y  #y  ,...,y^].  If  exact  arith- 
metic is  used  then  V^V^  = I and  so  B^+1  must  vanish  for  i ■ n,  if  not 
before,  but  in  any  event  equation  (2.2)  holds  to  within  machine  precision. 


Now  suppose  we  wish  to  solve  the  symmetric  system  Mx  = b.  Multiplying 
(2.2)  by  an  arbitrary  k-vector  y ^ whose  last  element  is  n^,  gives  = 

VkTkyk  + Vlyk+ink‘  ^ince  Vk^ iei^  ~ ^ by  definition,  it  follows  that  if 
y^  and  Xy  are  defined  by  the  equations 


T^k  = Vi 

xk  = Vkyk 


(2.3) 


(2.4) 


then  we  shall  have  Mx^  = b + n k^k+^k+i  to  workin6  accuracy.  Hence  may 
be  taken  as  the  exact  solution  to  a perturbed  system,  and  will  solve  the 
original  system  whenever  n^8^+1  is  negligibly  small. 

The  above  arguments  are  not  complete,  but  they  provide  at  least  some 
motivation  for  defining  the  sequence  of  vectors  { x according  to  equations 
(2.3)  and  (2.4).  It  is  now  possible  to  derive  several  iterative  algorithms 
for  solving  Mx  = b,  each  characterized  by  the  manner  in  which  y^  is 
eliminated  from  (2.3)  and  (2.4)  (since  it  is  not  practical  to  compute  each 
y £ explicitly).  In  particular,  the  method  of  conjugate  gradients  is  known 
to  be  equivalent  to  using  the  Cholesky  factorization  T ^ and  is 

reliable  when  M (and  hence  T^)  is  positive  definite,  while  algorithm  SYMMLQ 
employs  the  orthogonal  factorization  7^  = to  retain  stability  for 

arbitrary  symmetric  M.  (See  Paige  and  Saunders  [21]  for  further  details  of 
these  methods.) 

We  now  turn  to  a particular  symmetric  (but  indefinite)  system,  namely 


(2.5) 


which  is  well  known  to  solve  the  least-squares  problem,  min  ||/kr  - b\\ , for 
arbitrary  A and  b.  If  the  I,anczos  process  is  applied  to  the  partitioned 


6 


matrix  and  rhs  vector  in  (2.5),  the  recurrence  relations  (2.1)  simplify 
to  give  one  of  the  bidiagonaliz..  ion  procedures  to  be  discussed  in  the  next 
section.  Furthermore,  after  2k+l  iterations  the  tridiagonal  system  corres- 
ponding to  (2.3)  has  the  form 


(2.6) 


following  appropriate  change  in  notation. 

System  (2.6)  is  indefinite  and  could  be  dealt  with  as  in  algorithm 
SYIWLQ,  i.e.,  using  the  orthogonal  factorization  = ^2k+i®2k+r 

However,  this  factorization  proves  to  be  essentially  the  same  as  for  a 
general  tridiagonal  matrix  without  any  convenient  simplification.  Thus  a 
specialized  form  of  SYNMLQ  could  capitalize  on  the  simplified  form  of  the 
Lanczos  process,  but  it  would  not  take  full  advantage  of  the  structure 
inherent  in  (2.6). 

Instead  we  observe  that  a simple  symmetric  permutation  of  (2.6)  gives 
an  equivalent  system  of  the  form 


7 

where  = bidiagfB^oul  is  a lower  bidiagonal  matrix  of  order  k + i by  k. 
Comparing  (2.7J  with  (2.5)  immediately  reveals  another  least-squares 
problem,  min  . This  does  have  a convenient  structure  and  can 

be  solved  reliably  using  an  orthogonal  factorization  of  Bk,  as  in  the 
least-squares  algorithm  of  Golub  [9]. 

To  summarize:  an  application  of  the  Lanczos  process  to  the  problem 
min  ||^x  - b\\  leads  to  a series  of  subproblems  min  || which  can 

be  effectively  dealt  with  using  a conventional  QR  factorization  of  B^.  This 

i 

observation  forms  the  basis  for  algorithm  LSQR. 

i 
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3.  DIF.  BIDIAUONALI ZATION  PROCEDURES 


! ' 


i 

| 


t 


In  the  previous  section  it  was  noted  that  the  recurrence  relations 
(2.1)  simplify  when  the  Lanczos  process  is  applied  to  the  least-squares 
system  (2.5).  (In  fact  both  V and  T have  special  structure.)  The  result 
is  one  form  of  the  hi  diagonal izat ion  procedure  of  Golub  and  Kahan  [10).  For 
future  reference  we  shall  state  the  procedure  in  two  different  forms  and 
give  some  unexpected  relationships  between  the  forms.  It  will  then  be 
possible  to  derive  algorithm  LSQR  directly  and  relate  it  to  various  other 
algorithms  that  have  been  proposed. 

For  brevity  we  shall  call  the  procedures  Bidiag  1 and  2.  The  notation 
used  emphasizes  the  connections  between  them. 


Bidiag  1 (Starting  vector  b;  reduction  to  lower  bidiagonal  form) 


(a) 

81*1 

= b , a1w1  = . 

(b) 

®t+lMi+1 

ai+i’Ji+i 

> 

= Av . - a .u . 

1 1 i 

T 

= A Lu  . . - 6 . v . 
i+l  1+1  i 

i - 1 , 2,  ... 

(3.1) 

The  scalars  a.  > 0 

i 

and  3.  2 0 are  chosen 

i 

so  that  ||u£||  = ||w£||  = 1. 

With 

the  definitions 

Uk  ~ cV“2'***'“fc]  ’ Bk  = 

^ = f V ^ 3 ^2*  * * * * ^ » 


(where  is  the  rectangular  matrix  introduced  in  section  2),  the  recurrence 


B2  a2 


efc+i 


I 
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relations  (3.1)  may  be  rewritten  as 

WV  ,)mb  ’ (3-2) 

Airfc  = **+,**  . (3-3) 

A\*  = %T  + Vi<W*+iT  • f3-4^ 

If  exact  arithmetic  were  used  then  we  would  also  have  U,  ,TU,  = I and 

k+ 1 fe+i 

T 

= I,  but  in  any  event  the  above  equations  hold  to  within  machine 
precision. 

T 

Bidiag  2 (Starting  vector  A b;  reduction  to  upper  bidiagonal  form) 

(a)  01w1  = ATb  , p1p1  = Avy  . 

0»  ‘ -ffi  1 . . , <3'S) 

> t = 1,  2,  ... 

^vi+ i J 
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A\ = W + »k^kW  * 


and  with  exact  arithmetic  we  would  also  have  Pj^P^  m K I- 


(3.8) 


Bidiag  2 is  the  procedure  originally  given  by  Golub  and  Kahan  (with 
the  particular  starting  vector  A b ) . Either  procedure  may  be  derived  from 
the  other  by  choosing  the  appropriate  starting  vector  and  interchanging  A 


and  j4T. 


3. 1 Relationship  between  the  bidiagonalizations 

The  principal  connection  between  the  two  bidiagonalization  procedures 
is  that  the  matrices  are  the  same  for  each,  and  that  the  identity 


Bk\  = *k\ 


(3.9) 


holds.  This  follows  from  the  fact  that  v.  is  the  same  in  both  cases  and  V, 

i k 

is  mathematically  the  result  of  applying  k steps  of  the  Lanczos  process 
(2.2)  with  M - AtA.  The  rather  surprising  conclusion  is  that  R^  must  be 
identical  to  the  matrix  that  would  be  obtained  from  the  conventional  QR  . 
factorization  of  B^.  Thus 


(3.10) 


where  Q ^ is  orthogonal.  In  the  presence  of  nunding  errors  these  identities 
will  of  course  cease  to  hold.  However,  they  throw  light  on  the  advantages 
of  algorithm  LSQR  over  two  earlier  methods,  LSCG  and  LSLQ,  as  discussed  in 
section  7.4. 

The  relationship  between  the  orthonormal  matrices  U ^ and  can  be 
shown  to  be 


Uk*l  " iUk  uk+ l] 


r,^i 


(3.11) 


11 


>i 


for  some  vector  r^.  We  also  have  the  identities 


2 Q 2 2 
ai  + B2  = 


Vi  = 91 


ai  + 6t+12  = °i  + Qi  * Vt  = P,--i0£  f°r  7'  W • 


(3.12) 


. 


t 
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4.  ALGOR  I TIM  LSQR 


The  quantities  generated  from  A and  b by  Bidiag  1 will  now  he  used 
to  solve  the  least -squares  problem,  min  \\b  - Ax\\  . 

Let  the  quantities 


Vkyk 

(4.1) 

rk = 

b - Axk 

(4.2) 

k+ 1 * 

Vi  - Bkyk 

(4.3) 

be  defined  in  terms  of  some  vector  y y It  readily  follows  from  (3.2) 
and  (3.3)  that  the  equation 


rk = 


(4.4) 


holds  to  working  accuracy.  If  the  columns  of  Uk  were  exactly  orthogonal 
we  would  have  II  r*JI  = l^k+i  II » which  immediately  suggests  choosing  y k to 
minimize  ||tfc+1||.  Hence  we  are  led  naturally  to  the  least-squares  problem 


min  116,6, 

yk 


• V*11 


(4.5) 


which  forms  the  basis  for  LSQR. 


Computationally  it  is  advantageous  to  solve  (4.5)  using  the  standard 
QR  factorization  of  (Golub  [91),  i.e.  the  same  factorization  (3.10) 
that  links  the  two  bidiagonal izations.  This  takes  the  form 


(4.6) 


where  = ^k,k+y'  ”^2  3^1  2 *s  a Pr0t*uct  °f  plane  rotations  (e.g. 
Wilkinson  [27])  designed  to  eliminate  the  subdiagonals  82,  8^,  ...  of  B^. 
The  vectors  y ^ and  tfe+1  could  then  be  found  from 


Rtfh  = fk 


**♦1  " «*  x - 
♦fc+1 


C4.7) 


(4.8) 


However,  in  (4.7)  will  normally  have  no  elements  in  common  with  y^  . 
Instead  we  note  that  /fc]  is  the  same  as  /“fc  ] with  a new  row 

and  colimn  added.  Hence,  one  way  of  combining  (4.1)  and  (4.7)  efficiently 
is  according  to 


xk 3 VA\  E Vik 


(4.9) 


where  the  colunns  of  D ^ = [d^  d2  ...  d^ ] can  be  found  successively  from 
the  system  Rj^Dj^  m V k by  forward  substitution.  With  dQ  - xq  = 0 this 


gives 


dk  * pk  K ' Vk-J  » 

xk  " + *kdk 


(4.10) 


(4.11) 


and  only  the  most  recent  iterates  need  be  saved.  The  broad  outline  of 
algorithm  LSQR  is  now  conplete. 
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4. 1 Recurrence  relations 

The  QR  factorization  (4.6)  is  determined  by  constructing  the  k-th 
plane  rotation  Q k ^+1  to  operate  on  rows  k and  k+ i of  the  transformed 
(A k ) to  annihilate  This  gives  the  following  sinple  recurrence 

relation: 


ck 

CO 

** 

J 

la* 

1 

0 

pk  9k+ 1 

ak 

1 

O 

1 

4* 

CQ 

1 

Vi  5 

0 Vi  Vi 

where  pl  E a1 , 4^  = 3.,,  and  the  scalars  and  aie  the  nontrivial 
elements  of  ^fe+1-  The  quantities  PyAk  are  intermediate  scalars  that 
are  subsequently  replaced  by  P^’^k' 

The  rotations  Qk  ^+1  are  discarded  as  soon  as  they  have  been  used  in 
(4.12),  since  Qk  itself  is  not  required.  We  see  that  negligible  work  is 
involved  in  confuting  the  QR  factorization  to  obtain  Rk,  fk  and  ^ . 

Some  of  the  work  in  (4.10)  can  be  eliminated  by  using  vectors 
wk  ~ pk^k  P^ace  dy.  The  main  steps  of  LSQR  can  now  be  summarized 

as  follows.  (As  usual  the  scalars  a.  2:  0 and  3 . t 0 are  chosen  to  normalize 

t t 

the  corresponding  vectors;  for  exanple,  ot^  = ATuy  implies  the  amputa- 
tions ^ = a\,  a,  = ||,  v,  = (l/<xy)vv) 


, 
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1.  (Initialize.) 


eiul  = b * aiyi  = A * wi  = yi  * 


d>  = 8 . d = a 

M ’ Mi  1 


2.  For  i = 1,2,3,...  repeat  steps  3-6. 


3.  (Continue  the  bidiagonalization.) 


(a)  = Avi  - aiui 

(b)  at+iut+i  = A\+i  - Pi+i vi 


4.  (Construct  and  apply  next  orthogonal  transformation.) 


x = 0 , 

o ’ 


2 


2,* 


(a)  p.  = rp.  ♦ p.+// 

(b)  c.  = p./p. 

(c) 

(d)  ei+i  = 8tai+i 

(e)  p<+i  = ~°iai+ i 


(f)  ^ ^ 


(g)  *i+1  * a^.  _ 


5.  (Update  x,u.) 


(a>  *i  ‘ xi-<  * rn/0£J"t 
fb)  "i.,  ’ 


6.  (Test  for  convergence.) 

Exit  if  some  stopping  criteria  (yet  to  be  discussed)  have  been  met. 
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5.  ESTIMATION  OF  NORMS 

Here  we  show  that  estimates  of  the  quantities  ||r*^||  , ||i4Tr^||  , ||x^||  , 

||A||  and  cond ( A ) can  be  obtained  at  minimal  cost  from  items  already  required 
by  LSQR.  All  five  quantities  will  be  used  later  to  formulate  stopping  rules. 

Knowledge  of  ||a||  and  perhaps  cond  (A)  can  also  provide  useful  debugging 

information.  For  example,  a user  must  define  his  matrix  A by  providing  two 

subroutines  to  compute  products  of  the  form  Av  and  A u.  These  subroutines 

will  typically  use  data  derived  from  earlier  computations,  and  may  employ 

rather  complex  data  structures  in  order  to  take  advantage  of  the  sparsity 

of  A . If  the  estimates  of  ||/l||  and/or  cond  (A)  prove  to  be  unexpectedly  high 

or  low  then  at  least  one  of  the  subroutines  is  likely  to  be  incorrect.  As 

a rule  of  thumb  we  recommend  that  all  columns  of  A be  scaled  to  have  unit 

length  (||Ae -||  = 1 1 j = 1 , n) , since  this  usually  removes  some 

J 

unnecessary  ill -conditioning  from  the  problem.  Under  these  circumstances, 
a programing  error  should  be  suspected  if  the  estimate  of  ||/l||  differs  by  a 
significant  factor  from  n (since  the  particular  norm  estimated  will  be 

IWIF). 

For  the  purposes  of  estimating  norms  we  shall  often  assume  that  the 
orthogonality  relations  = I and  * I hold,  and  that 

|| ||  2 = II  VpW  2 * 1.  In  actual  computations  these  are  rarely  true,  but  the 
resulting  estimates  have  proved  to  be  remarkably  reliable. 


S.l  Estimates  of  |jr?  ||  and  ||ATr?i;|| 


From  (4.4)  and  (4.8)  we  have 

rk  m \^ukWek+y 

(which  explains  the  use  of  r ^ in  (3.11))  and  hence  by  assuming 
uc  obtain  the  estimate 


(5.1) 


• I 


1 
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W “ ♦fc+i  " 8iVk-r-*ai  • (5-2) 

where  the  form  of  follows  from  (4.12).  LSQR  is  unusual  in  not 

having  the  residual  vector  explicitly  present,  but  we  see  that  ||r^|| 
is  available  essentially  free.  Clearly  the  product  of  sines  in  (5.2) 
decreases  monotonically.  It  should  converge  to  zero  if  the  system  Ax  = b 
is  compatible.  Otherwise  it  will  converge  to  a positive  finite  limit. 

m 

For  least-squares  problems  a more  important  quantity  is  A which 
would  be  zero  at  the  final  iteration  if  exact  arithmetic  were  performed. 

From  (5.1),  (3.4)  and  (4.6)  we  have 


A rk  KBk  + ak+iyk+lek+i  ^®k  ek+'i 

-**./*[V  °] «*♦, 

+ ^k+iak+l ^ek+l  ®k  ek+^Vk+ 1 

The  first  term  vanishes  and  it  is  easily 

seen  that  the  (k+l)th  diagonal 

of  is  -Oy  Hence  we  have 

A rk  = " (♦k+iafe+ick)7k+i 

(5.3) 

and 

= ^+iak+1lcfel 

(5.4) 

to  working  accuracy.  No  orthogonality  assumptions  are  needed  here. 

5.2  An  estimate  of  Hx^.H 

The  upper  bidiagonal  matrix  Ry  may  be  reduced  to  lower  bidiagonal 
form  by  the  orthogonal  factorization 

Vk  ■ h tS.S) 

where  Q ^ is  a suitable  product  of  plane  rotations.  Defining  z ^ by  the 
system 

V*  ’ h • <5-6> 
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it  follows  that  x*  = (V^R^  ll/*  = (V^Q^  )z^  = J/^*  . Hence,  under  the 

T 

assumption  that  F*  F*  = I we  can  obtain  the  estimate 

11**11  = 11**11  • (5.7) 

Note  that  the  leading  parts  of  £*,  £*,  V*  and  z ^ do  not  change  after 
iteration  k.  Hence  we  find  that  estimating  ||x*||  via  (5. 5) -(5. 7)  costs  only 
13  multiplications  per  iteration,  which  is  negligible  for  large  n. 


5.3  Estimation  of  ||/i||^  and  condf/ll 


It  is  clear  from  (3.1)  that  all  the  v ^ lie  in  the  range  of  A 1 and  are 

T 

therefore  orthogonal  to  the  null  space  of  A and  A A.  With  appropriate 
orthogonality  assumptions  we  have  from  (3.3)  that 

BkBk  = Vk  ^ A Vk 


and  so  from  the  Courant-Fischer  minimax  theorem  the  eigenvalues  of  #*T#* 
are  interlaced  by  those  of  A A and  are  bounded  above  and  below  by  the 
largest  and  smallest  nonzero  eigenvalues  of  A^A.  The  same  can  therefore 
be  said  of  the  singular  values  of  0*  compared  with  those  of  A.  It  follows 
that  for  the  2-  and  F-norms, 


11**11  * Ml  , (5.8) 

where  equality  will  be  obtained  in  the  2-norm  for  some  k s rankf/U  if  b 
is  not  orthogonal  to  the  left-hand  singular  vector  of  A corresponding  to 
its  largest  singular  value.  Equality  will  only  be  obtained  for  the  F-norm 
if  b contains  components  of  all  left-hand  singular  vectors  of  A corres- 
ponding to  nonzero  singular  values.  Nevertheless  we  will  use  ||f*||^,  as  a 
monotonically  increasing  estimate  of  the  size  of  A. 

The  foregoing  also  implies  that  0*T0*  = /?*Tfl*  is  nonsingular  and  for 
the  2-  and  F- norms 


i 
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Ill'll  " llVll  5 IM*II  • (5-9) 

The  remarks  on  equality  are  the  same,  except  now  "largest  singular  value" 
is  replaced  by  "smallest  nonzero  singular  value". 

Combining  these  results  with  the  definition  in  (4.9)  now 

gives 

2 ^ ||flfc||  ll^ll  * IMI  ||/||  = cond (A)  (5.10) 

for  the  2-  and  F-norms.  Hence  we  take  II^II^II^Hp  as  a monotonically 
increasing  estimate  of  condf/U , which  starts  at  the  optimistic  estimate 

KUI^IIf  = 2. 

Use  of  Frobenius  norms  allows  the  estimates  to  be  accumulated 

cheaply,  since  ||fife||F2  = ||flfe_1||/,2  + otfe2  + Sfe+12and  ||^||2  = || DR_^  + ||^||2. 

The  individual  terms  in  the  sum  \\cL\\  = E 6.,  can  be  used  further  for 

K i=1 

estimating  standard  errors,  as  we  show  next. 


5.4  Standard  errors 


In  regression  problems  with  m > n = rank(/U  the  standard  error  in 


the  i-th  component  of  the  true  solution  x is  taken  to  be  where 


2 _ \\b  - Ax\ 


m - n 


•a. . 

11 


T t — 1 ... 

and  a..  = e.  (A  A)  e.  is  the  i-th  diagonal  element  of  (A  A) 


11 


(5.11) 


Now  from 


(3.3)  and  (3.10)  we  have  V^A^AV^  = which  with  (4.9)  gives 

I . 


DkA H 


Assuming  that  premature  termination  does  not  occur,  it  follows  that  with 
exact  arithmetic  * (ATA)  1 , and  we  can  approximate  the  o „ by 


1 


I 
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o./^  r e.T/j  n Te.  . Since  d,  £>, T = D,  D,  1 + d,d,  r , we  have 
i k k i k k k-i  k-1  k k ' 

_ (k)  _ n (k- \)  t 2 ro;  _ 

0..  — O*.  + o . . , 0 • . — d , 

11  11  ifc  11 


and  the  o..  are  monotonically  increasing  estimates  of  the  a..  . 

ii  ' 

(k) 

In  the  implementation  of  LSQR  we  accumulate  for  each  i,  and 

upon  termination  at  iteration  k we  set  l = ma x(m-n,l)  and  output  the  square 


roots  of 


. 11^-^feH2  (k) 


— o . . 

ii 


as  estimates  of  the  a.  in  (5.11).  The  accuracy  of  these  estimates  cannot 
he  guaranteed t especially  if  termination  occurs  early  for  some  reason. 
However,  we  have  obtained  one  reassuring  comparison  with  the  statistical 
package  GLIM  (Nelder  [17]). 

On  a moderately  ill-conditioned  problem  of  dimensions  171  by  38, 

(condM)  - lo\  relative  machine  precision  = Z0-11),  an  accurate  solution 

(k ) 

x.y  was  obtained  after  69  iterations,  and  at  this  stage  all  s ^ agreed  to 
at  least  one  digit  with  the  output  by  GLIM,  and  many  components  agreed 
more  closely. 

A further  comparison  was  obtained  from  the  1033  by  434  gravity-meter 
problem  discussed  in  section  10.3.  For  this  problem  a sparse  QR  factoriza- 


tion was  constructed,  QA 


ra- 


and  the  quantities  o..  were  computed 


accurately  using  fry.  = e.,  o..  = ||y.||  . Again  the  estimates  of  s. 

from  LSQR  proved  to  be  accurate  toat  least  one  significant  figure,  and  the 
larger  values  were  accurate  to  three  or  more  digits. 

2 

Note  that  ».  estimates  the  variance  of  the  i-th  component  of  x,  and 

(k)2 

that  e . approximates  this  variance  estimate.  In  an  analogous  manner  we 
could  approximate  certain  covariance  estimates  by  accumulating 
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„ (k)  _ n (k-M  ^ . 

ij  °ij  ik  jk  3 


o.!o)  I o , 
1.1 


for  any  specific  pairs  and  then  computing 


~ Axk^  Q (k) 
l ij 

on  termination.  This  facility  has  not  been  implemented  in  LSQR  and  we  have 
not  investigated  the  accuracy  of  such  approximations.  Clearly  only  a limited 
number  of  pairs  ( i,j)  could  be  dealt  with  efficiently  on  large  problems. 
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6.  STOPPING  CRITERIA 

An  iterative  algorithm  must  include  rules  for  deciding  whether  the 
current  iterate  x^  is  an  acceptable  approximation  to  the  true  solution  x. 
Here  we  shall  formulate  stopping  rules  in  terms  of  three  dimensionless 
quantities  ATOL,  BTOL  and  CONLIM,  which  the  user  will  be  required  to 
specify.  The  first  two  rules  apply  to  compatible  and  incompatible  systems 
respectively.  The  third  rule  applies  to  both.  They  are: 

SI : Stop  i f 

I S2:  Stop  if 

1 

S3:  Stop  if 

We  can  implement  these  rules  efficiently  using  the  estimates  of  ||r-^||  , ||/i||„, 
etc.,  already  described. 

The  criteria  SI  and  S2  are  based  on  allowable  perturbations  in  the 
data.  The  user  may  therefore  set  ATOL  and  BTOL  according  to  the  accuracy 
of  the  data.  Por  example,  if  (A  ,b)  is  the  given  data  and  (A,b)  represents 
the  (unknown)  true  values,  then 

ATOL  - ||/t  - A ||  / ||a|| 

should  be  used  if  an  estimate  of  this  is  available.  Similarly  for  BTOL. 

Criteria  S3  represents  an  attest  to  regularize  ill-conditioned 
systems. 


BTOLll&H  ♦ ATOL||A||  ||*, 


I A ir- 


< ATOL  . 


INI 


cond  (A)  a:  CONLIM 


27, 


i 


6 . 1 Compatible  systems 

To  justify  SI,  let  />k  - b - Ax k as  usual,  and  define  the  quantities 
«*  ' Kh)l\\b\\  + ATOLlMH  ||*fc||  , 

<Jk  = BTOL||/?  ||  ^ , 

hk  = ATOLlMII  ||.^||  ^ . 


Then  rk  = gk  + hk,  and 


h , x 


A + 


o_ 

T 

x,  x, 
k fe 


= * 


so  that  xk  is  the  exact  solution  for  a system  with  both  A and  b perturbed. 
It  can  be  seen  that  these  perturbations  are  within  their  allowable  bounds 
when  the  inequality  in  SI  holds.  Hence,  criterion  SI  is  consistent  with 
the  ideas  of  backward  rounding  error  analysis  and  with  knowledge  of  data 
accuracy.  Since  this  argument  does  not  depend  on  orthogonality,  SI  can  be 
used  in  any  method  for  solving  compatible  linear  systems. 


6.2  Incompatible  systems 


Stewart  [251  has  observed  that  if 


and 

where 


rk-b  - Axk 


h " b - (A  + Ek)xk 

r.  rhTA 

E,  = - -O—  , 

k ii  m2 


T~  - 

then  (A  + Ek)  rk  * 0,  so  that  xk  and  rk  are  the  exact  solution  and  residual 
for  a system  with  A perturbed.  Since  l|^||2  = ||/lTr>fe||  / ||r^|| , the  perturbation 
to  A will  be  negligible  if  the  test  in  S2  is  satisfied. 
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In  our  particular  method  it  happens  that  E^x^ 
that  x^  = is  theoretically  orthogonal  to  4Tr-^. 

both  and  are  exact  for  the  perturbed  system, 

case  for  using  rule  S2. 


= 0,  since  (5.3)  shows 


Hence  r ^ = r^,  so 
This  strengthens  the 


In  practice  we  find  that  ||/1T)"^||  / ||r^||  can  vary  rather  dramatically 
with  k,  hut  it  does  tend  to  stabilize  for  large  k,  and  the  stability  is 
more  apparent  for  LSQR  than  for  the  standard  method  of  conjugate  gradients 
(sec  ||i4Tr^||  in  Figures  3 and  4,  section  8).  Criterion  S2  is  sufficient  to 
ensure  that  x ^ is  an  acceptable  solution  to  the  least-squares  problem,  but 
the  existence  of  an  easily  computable  test  that  is  both  sufficient  and 
necessary  remains  an  open  question  (Stewart  [25]). 


6 . 3 Ill-conditioned  systems 


Stopping  rule  S3  is  a heuristic  based  on  the  following  arguments. 
Suppose  that  A has  singular  values  °1  - °2  * • • • - °n  > It  has  been 
observed  in  some  problems  that  as  k increases  the  estimate  ||fl,  = 

cond(/U  in  (5.10)  temporarily  levels  off  near  some  of  the  values  of  the 
ordered  sequence  c^/o^  ^ /°„>  with  varying  numbers  of  itera- 

tions near  each  level.  This  tends  to  happen  when  the  smaller  are  very 
close  together,  and  therefore  suggests  criterion  S3  as  a means  of 
regularizing  such  problems  when  they  are  very  ill-conditioned,  as  in  the 
discretization  of  ill -posed  problems  (e.g.  Nashed  [16]). 

For  example,  if  the  singular  values  of  A were  known  to  be  of  order 
1,  0.9,  10~\  Id-6,  10~7 , the  effect  of  the  two  smallest  singular  values 
could  probably  be  suppressed  by  setting  CONLIM  = Id4. 


A more  direct  interpretation  of  rule  S3  can  be  obtained  from  the  fact 
that  Xj,  = First,  suppose  that  the  singular  value  decomposition  of  A 
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is  A = UIV*  where  UTU  = Vs  V = WT  = I , L = diag(o1 ,a 2 , . . . ,o  ) , and  let 
/\(r)  = UY.(r)Vr 

be  defined  by  setting  o^+1  = ...  = a = 0.  A common  method  for  regularizing 

( r*)  + T 

the  least-squares  problem  is  to  compute  x = VT  tl  b for  some  r <,  nf 
since  it  can  readily  be  shown  that  the  size  of  is  bounded  according  to 

< -1  = cond (Alr}). 

°r 


In  the  case  of  LSQR,  we  have  ||x^||  < ||z?^||f||i||  and  so  if  rule  S3  has 
not  yet  caused  termination  we  know  that  ||B^||^||x^||  / ||fc||  < 1 11^1 1^1  Ip  < CONLIM. 
Since  ||5^t|p  usually  increases  to  order  ||i4||  quite  early,  we  effectively  have 


F 


II&II 


< CONLIM  , 


which  is  exactly  analogous  to  the  bound  above. 


6.4  Singular  systems 


It  is  sometimes  the  case  that  rankMl  < n.  Known  dependencies  can 
often  be  eliminated  in  advance,  but  others  may  remain  if  only  through  errors 
in  the  data  or  in  formulation  of  the  problem. 

With  conventional  (direct)  methods  it  is  usually  possible  to  detect 
rank  deficiency  and  to  advise  the  user  that  dependencies  exist.  In  the 
.present  context  it  is  more  difficult  to  provide  such  useful  information, 
but  we  recognise  the  need  for  a method  that  at  least  does  not  fail  when 
applied  (perhaps  unknowingly)  to  a singular  system.  In  such  cases  we  again 
suggest  the  parameter  CONLIM  as  a device  for  controlling  the  computation. 

Our  experience  with  LSQR  on  singular  problems  is  that  convergence  to  an 
acceptable  solution  occurs  normally,  but  if  iterations  are  allowed  to 


! 


► 


I 


i 

j 
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continue  the  computed  x^  will  begin  to  change  again  and  then  grow  quite 
rapidly  until 


(while  || r^||  remains  of  reasonable  size).  The  estimate  of  cond64)  typically 
grows  large  before  the  growth  in  A moderate  value  of  GONLIM  (say  l/i**) 
may  therefore  cause  termination  at  a useful  solution. 


In  some  cases  it  can  be  useful  to  set  CONLIM  as  large  as  7/e  and  allow 
x to  diverge.  In  this  context  we  note  that  the  algorithm  SYNMI/)  [21]  can 
be  applied  to  singular  symmetric  systems  and  that  extreme  growth  in  the 
resulting  ||xj,l|  forms  an  essential  part  of  a practical  method  for  computing 
eigenvectors  of  large  symmetric  matrices  (Lewis  TIS]).  By  analogy,  in  the 
presence  of  rounding  errors  LSQR  will  usually  produce  an  approximate  singular 
vector  of  the  matrix  A.  In  fact,  using  (6.1)  and  ||rj|  < ||/>||  we  see  that  the 
normalized  vector  x^  = x.y  / ||x^||  will  usually  satisfy 


7 

11**11 


(h  - rk) 


= 0(t)\\A\  | 

for  large  enough  *,  and  hence  will  lie  very  nearly  in  the  null  space  of  A. 
The  vector  x^  may  reveal  to  the  user  where  certain  unexpected  dependencies 
exist.  Suitable  new  rows  could  then  be  added  to  A , or  certain  linear  con- 
straints could  be  applied  directly,  as  described  in  section  9.4. 
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7.  OTHER  METHODS 

Several  other  conjugate -gradient  methods  are  discussed  here.  All 
except  the  first  (CG1,S)  are  stated  using  notation  consistent  with  sections 
3 - b,  in  order  to  illustrate  certain  analytic  identities. 

7.1  COLS 

If  the  conjugate-gradient  method  for  symmetric  positive  definite 
systems  is  applied  naively  to  the  normal  equations  ATAx  = ATb,  the  method 
does  not  perform  well  on  ill-conditioned  systems.  To  a large  extent  this 

m 

is  due  to  the  explicit  use  of  vectors  of  the  form  A Ap An  algorithm 
with  better  numerical  properties  is  easily  derived  by  a slight  algebraic 
rearrangement,  making  use  of  the  intermediate  vector  Ap ^ (e.g.  Hestenes 
and  Stiefel  [11]).  It  is  usually  stated  in  notation  similar  to  the 
following. 

Algorithm  CGLS 


x = 0 

o 


1.  Set 

v = 

b,  e = A^by 

o 

o ’ 

2.  For 

i = 1, 

2,3  .. . repeat 

(a) 

II 

(b) 

a. 

t 

/ ii  2 

(c) 

xi 

= xt-i + 

(d) 

ri 

* r . , - a. q . 
1-1 

(e) 

e. 

= Arr . 

V 

i 

(0 

■ ll»<ll2 

(g) 

“ V*i-, 

(h) 

1 

• ei  * s<p<  . 
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A practical  implementation  of  the  method  would  also  need  to  conpute 
UrJI  , ||x^.||  and  an  estimate  of  ||/l||  in  order  to  use  the  stopping  criteria 
developed  in  section  6.  Otherwise  the  method  is  clearly  simple  and 
economical.  Analytically  it  generates  the  same  points  x ^ as  LSQR.  There 
is  no  simple  connection  with  either  of  the  bidiagonalizations,  hut  the 
vectors  y.  and  d.  in  LSQR  are  proportional  to  s.  and  p.  respectively. 

7.2  Craig's  method  for  Ax  - b 

A very  simple  method  is  known  for  solving  compatible  systems  Ax  = b.  ‘ 

This  is  Craig's  method,  as  described  in  Faddeev  and  Faddeeva  [7].  It  is 

derivable  from  Bidiag  1 as  shown  by  Paige  [18]  and  differs  from  all  other  1 

methods  discussed  here  by  minimizing  the  error  norm  ||x^  - jc||  at  each  step, 
rather  than  the  residual  norm  ||h  - Ax^\  - |J/W'x^  - x)\\ . We  review  the  ^ , 

derivation  briefly. 

If  Ly  is  the  first  k rows  of  , . . 


then  equations  [3.3)  - [3.4)  describing  Bidiag  1 may  be  restated  as 
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Craig's  method  is  defined  by  the  equations 

Lkzk  = Vi  » xk  = Vk  ’ (7‘2^ 

and  we  can  show  from  (7.1)  that  the  residual  vector  satisfies 
v^~b-  AV^z^  = "Vk+iuk+i  ^ ^ence  yfeTr/;  = 0 • We  can  therefore 
expect  r*£,  to  vanish  (analytically)  for  some  k < n. 

The  vectors  z ^ and  x ^ are  readily  computed  from 

^k  = ~rWCk- 1 * xk  = xk-i  + Vk  ’ 

where  C = -2.  Since  the  increments  y ^ form  an  orthogonal  set  there  is  no 
danger  of  cancellation,  and  the  step-lengths  are  bounded  by  |^|  < ||aj|  = 
||a;^||  < ||x|| . We  can  therefore  expect  the  method  to  possess  good  numerical 
properties.  This  is  confirmed  by  the  comparison  in  section  8. 


7.3  Extension  of  Craig's  method 

A scheme  for  extending  Craig's  method  to  least-squares  problems  was 
suggested  by  Paige  in  [18].  The  vectors  in  (7.2)  were  retained  and  an 
additional  vector  of  the  form  VjW^  was  computed  in  parallel.  On  termination, 
a suitable  scalar  y^  was  computed  and  the  final  solution  taken  to  be 

xk = (W  - VW  E Vk  • (7-3) 


In  the  present  context  this  method  may  be  interpreted  as  a means  of  solving 
the  least -squares  system  (2.7),  viz. 


1 Bk 

tk+^ 

Vi 

X* 

1 

»k 

0 

(7.4) 


using  the  fact  that  the  underdetermined  system  “ 0 has  a unique 


1 


J 
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solution  apart  from  a scalar  multiple, 
according  to 

T 


T 

Thus  if  we  partition  and 


B, 


Vi  Zk 


'k+ 1 ' Y/< 


for  some  y^,  the  system  = -ot^  can  be  solved  easily  by  forward  sub- 

stitution, and  then  from  the  top  half  of  (7.4)  we  obtain  the  required 
equations 

r — 

l 


hwk  = 


'k- 1 


’ yk  = fyc-Vk  ! refe+iwfe  ~ V ’ 


using  straightforward  notation.  The  first  equation  indicates  that  can 
he  computed  by  forward  substitution,  thus  allowing  to  be  accumulated 

as  the  algorithm  progresses.  The  second  shows  how  y^  can  be  computed  at 
each  step  (if  desired)  and  at  the  final  iteration  for  use  in  (7.3). 

In  the  original  presentation  of  this  method  the  danger  of  cancellation, 
in  (7.3)  was  recognized,  in  the  event  that  and  were  large  and 

nearly  equal.  When  the  system  Ax  = b is  compatible  the  danger  does  not 
exist  because  each  vector  l^a,  is  the  same  as  in  Craig's  method  and  remains 
of  reasonable  size.  We  have  observed  in  practice  that  ||w^||  actually 
diverges  with  increasing  k,  but  the  scalars  y^  converge  to  zero  at  a 
greater  rate.  The  points  - y^T.u^,,  if  computed,  are  at  all 

stages  identical  to  those  produced  by  LSQR , and  for  sufficiently  large  k 
are  computationally  just 

When  b does  not  lie  in  the  range  of  A we  find  that  severe  cancellation 
can  occur,  with  the  y^  stabilizing  at  a moderate  value  but  ||a^||  and  ||u^|| 
both  diverging.  For  this  reason  the  extension  of  Craig's  method  must  be 
discarded. 


I 


. 


Note  that  the  divergence  of  \\zy\\  in  these  circumstances  shows  that  Ly 
in  (7.2)  can  be  much  more  ill-conditioned  than  A.  If  Bidiag  1 is  used  as 
in  [17)  to  estimate  the  singular  values  of  A,  it  is  the  singular  values  of 
B y rather  than  t,y  that  arc  relevant. 


7.4  LSCG  and  LSLQ 

A second  algorithm  for  least-squares  problems  was  given  by  Paige  [18). 
This  is  algorithm  LSCG,  based  on  Bidiag  2.  In  the  notation  of  section  3 it 


is  defined  by  the  equations 


Rk  Rk“k  = 0iei 


xk  VJc^k 


and  implemented  in  the  form  = 0^  , Xy  = (V^ )fy  . Given  the 

relations  between  the  two  bidiagonal izations  we  now  recognize  that  this  is 
analytically  equivalent  to  LSQR,  but  numerically  inferior,  since  it  is 
effectively  solving  the  least-squares  problem  min||5^  - B^H  by  using  the 
corresponding  normal  equations.  (The  latter  are  ByByyk  = = a 

and  by  (3.9)  and  (3.12)  this  is  equivalent  to  (7.5a).) 

Algorithm  LSLQ  (Paige  and  Saunders  [20])  is  a refinement  of  LSCG,  but 
again  it  is  based  on  Bidiag  2 and  the  above  normal  equations  and  is  there- 
fore inferior  to  LSQR  on  ill-conditioned  problems.  The  refinement  has  been 
described  in  section  (5.2),  giving  Xy  = where  Wy  is  theoretically 
orthonormal,  the  intention  being  to  avoid  any  possible  cancellation  that 
could  occur  in  accumulating  Xy  = = (v k^)fk  • The  same  refinement 

can  easily  be  made  to  LSQR,  and  it  was  implemented  in  an  earlier  version  of 
the  algorithm  for  the  same  reason.  However,  we  have  not  been  able  to  detect 
any  numerical  difference  between  Xy  = wyzk  and  Xj<  = Dkfk  in  the  two  versions 
of  LSQR,  so  the  fear  of  cancellation  appears  to  have  been  unfounded.  We 
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have  therefore  retained  the  slightly  more  economical  x^  = which  also 

allows  condfAl  to  be  estimated  from  ||l^llp  as  already  described. 

Algoritlims  LSCG  and  LSLQ  need  not  be  considered  further. 

7 . 5 Chen’s  algorithm  RRLS 

Another  algorithm  based  on  Bidiag  2 has  been  described  by  Chen  [41. 

This  is  algorithm  RRLS,  and  it  combines  Bidiag  2 with  the  so-called  residual  - 
reducing  method  of  Householder  [12].  In  the  notation  of  section  3 it  may  be 
described  as  follows.  The  residual -reducing  property  is  implicit  in  steps 
2 f b ) and  2(c). 

Algorithm  RRLS 

T 

1.  Set  vQ  = b,  61u1  = A b,  = 0 . 

2.  For  i = 1,2,3,  ...  repeat  the  following: 


(a) 

pi?i 

= Aw. 

(b) 

A. 

z 

II 

c4. 

c4. 

(c) 

r . 
z 

ri- 1 

A .p . 
zrz 

(d) 

02+iyi+l 

= ATp.  - 

f z 

PiVi 

(e) 

x . 
z 

II 

H 

c4. 

i 

+ 

(h/pi)wi 

(f) 

w . . 
z + } 

ll 

si 

c4. 

+ 

1 

(QuJpz)w 

where  the  scalars  and  0^  are  chosen  so  that  \\p_- 1|  = ||y^.||  = 1 . 

As  with  CGLS,  a practical  implementation  would  also  require  ||r*.||  and 

llx^H.  The  square  root  of  the  sum  (p}  + %i^)  = H^l^2  = H^ll/  could  be 

t = 1 

used  to  estimate  ||d||p  and  ||/lTr^||  can  also  be  estimated  cheaply. 

Note  that  the  vectors  v.  are  generated  as  in  Bidiag  2,  but  the  vectors 
P i come  instead  from  step  2(a).  Substituting  the  latter  into  step  2(d) 
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T 

shows  that  RRLS  requires  explicit  computation  of  the  vectors  A Aw . 
fignoring  normalization  by  p^.).  Unfortunately  this  must  cast  doubt  on  the 
numerical  properties  of  the  method,  particularly  when  applied  to  compatible 
systems.  Indeed  we  find  that  for  some  systems  Ax  - b,  the  final  norms  ||rv.|| 
and  ||x7-  - x||  are  larger,  by  a factor  approaching  cond (A),  than  those 
obtained  by  CGLS  and  LSQR.  This  is  illustrated  in  section  8.3. 

A second  algorithm  called  RRLSL  has  been  described  by  Chen  [43  in 
which  the  residual -reducing  method  is  combined  with  Bidiag  1.  However,  the 
starting  vector  used  is  AArh  ( rather  than  b),  and  products  of  the  form  ATAw^ 
are  again  required,  so  that  improved  performance  seems  unlikely.  Chen 
reports  that  RRLS  and  RRLSL  behaved  similarly  in  all  test  cases  tried. 


In  spite  of  the  above  comments  we  have  also  observed  ill-conditioned 
least -squares  problems  for  which  RRLS  obtains  far  greater  accuracy  than 
would  normally  be  expected  of  any  method  (see  section  8.4  for  a possible 
explanation).  Because  of  this  unusual  behavior  we  have  investigated  a 
residual -reducing  version  of  LSQR  as  now  described. 

7.6  RRLSQR 

If  the  residual  vector  is  explicitly  introduced,  algorithm  LSQR  as 
summarized  in  section  4.1  can  be  modified  slightly.  First,  the  residual- 
reducing  approach  requires  step  5(a)  to  be  replaced  by  the  two  steps 


x • - x . . + X .w . , 


ri  = ri- 1 ' V;  • 

where  p ^ - Aw^  and  X^  - 1 / lip^ll  2 • Hn  this  case  P ^ is  unnormalized.) 
Second,  the  product  Aw-  can  be  used  to  eliminate  4t>.  from  Bidiag  1,  leading 
to  an  alternative  method 


8,  , = Aw-  - 

+ 1 i,  i|7» 


i-ii 


t-i 


(7.6) 


for  generating  each  ft ^ and  u..  (This  result  is  difficult  to  derive,  hut 
the  key  relation  is  Pi/\\pi\ I = o.r,. /||r*i_1||  + , which  may  be 

deduced  f rom  ( 3 . 1 1 ) . ) 

The  remainder  of  LSQR  is  retained,  including  the  QR  factorization  of 
b^.  The  coefficient  of  1 in  (7.6)  can  be  expressed  in  several  ways; 
for  example 

h p£  = J_  (_vi- 1 a1a2---af 
II ill  ft  -.6^ 

where  P £ comes  from  the  system  £.  3^,  = ft  e of  Craig's  method.  Different 
formulae  lead  to  different  iteration  paths  but  no  variation  appears  to  be 
consistently  better  than  the  rest. 

A summary  of  the  resulting  algorithm  follows. 

Algorithm  RRLSQR 

1.  vq  = b,  ft1a1  = b,  a1u1  * ATu^,  - i^,  xq  = 0, 


■ »,• 

pi  = °i  * 

2. 

For  i = 

1,2,3,  ...  repeat  steps  3-6 

3. 

(a) 

P • = AWj 

(bj 

= 

(c) 

ri  = ' V'; 

<d> 

ui+i  = Pf  ' 

(e)  ai+1 

yf+i  = *T“i+i  - 6i+iyi 

4. 

Compute 

Pv  nv  ev  ei+1,  Pi+1,  ?i+1 

5. 

(a) 

x ■ - x . , * A .u . 

t t-i  t i 

fb) 

ut+l  = yt  + i " 

6.  Exit  if  appropriate. 
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This  adaption  of  Bidiag  1 to  obtain  RRLSQR  is  analogous  to  (and  was 
motivated  by)  Chen's  adaption  of  Bidiag  2 to  obtain  RRLS.  Note  however  that 
there  are  no  products  of  the  form  A Aw ^ . In  practice  we  find  that  RRLSQR 
typically  performs  at  least  as  well  as  LSQR,  as  measured  by  the  limiting 
|| x^  - x\\  attainable.  Furthermore,  it  attains  the  same  unusually  high 
accuracy  achieved  by  RRLS  on  certain  ill  conditioned  least-squares  problems. 
On  these  grounds  RRLSQR  could  sometimes  be  the  preferred  method.  However, 
its  work  and  storage  requirements  are  significantly  higher  than  for  the 
other  methods  considered. 

7 . 7 Storage  and  work 

The  storage  and  work  requirements  for  the  most  promising  algorithms 
are  summarized  below.  Recall  that  A is  m by  n and  that  for  least-squares 
problems  m may  be  considerably  larger  than  n.  Craig's  method  is  applicable 
only  to  compatible  systems  Ax  = b,  which  usually  means  m = n. 


Storage 
m n 

Work 
m n 

Craig 

(Ax  = b only) 

u,  Av  x,  v 

3 4 

CGLS 

r , q x,  p,  8 

2 3 1 

LSQR 

u , Av  x,  v,  w 

3 5 

RRLS 

„T 

r,  p x,  v,  w,  A p 

4 5 

RRLSQR 

r,  u,  p x,  v,  w 

6 5 

All  methods  require  the  starting  vector  b.  If  necessary  this  may  be 
overwritten  by  the  first  m- vector  shown  (r  or  u) . The  m-vector  Av  shown 
for  Craig  and  LSQR  represents  working  storage  to  hold  products  of  the 


7>b 

T 

form  Av  and  A u.  (An  rc-vector  would  be  needed  if  m < n.)  In  some  applica- 
tions this  could  be  dispensed  with  if  the  bidiagonal ization  operations 

T 

Av  - a u and  A u - $y  were  implemented  to  overwrite  u and  y respectively. 
Similarly  the  n- vector  AVp  for  RRLS  could  in  some  cases  be  computed  without 
extra  storage. 

The  work  shown  for  each  method  is  the  number  of  multiplications  per 
iteration.  For  example,  LSQR  requires  3m  + 5n  multiplications.  (A  further 
2n  multiplications  are  needed  to  accumulate  estimates  of  cond(/U  and 
standard  errors  for  x.)  Practical  implementations  of  CGLS  and  RRLS  would 
require  a further  m ♦ n multiplications  to  compute  !j/*.||  and  ||x.|j  for  use  in 

j 

stopping  rules,  although  this  could  be  limited  to  every  tenth  iteration, 
say,  without  serious  consequence. 

T 

All  methods  require  one  product  Av  and  one  product  A v each  iteration. 
This  could  dominate  the  work  requirements  in  some  applications. 
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8.  NUMERICAL  COMPARISONS 

Here  we  compare  LSQR  numerically  with  four  of  the  methods  discussed 

in  section  7,  denoted  by  CRAIG,  CCLS,  RRLS  and  RRLSQR.  The  machine  used 

“12  11 

was  a Burroughs  B6700  with  relative  precision  c = 0.5x8  = 0.7x10 

The  results  given  here  are  complementary  to  those  given  by  Elfving  [6], 
who  compares  CGLS  with  several  other  conjugate -gradient  algorithms  and  also 
investigates  their  performance  on  problems  where  A is  singular. 

8 . 1 Generation  of  test  problems 

The  following  steps  may  be  used  to  generate  a test  problem  min||fo  - Ax\\ 
with  known  solution  x. 

1.  Choose  vectors  x,  y,  a,  c and  diagonal  matrix  D arbitrarily,  with 

ll.vll  = INI  = 7 • (For  any  chosen  m > n,  the  vectors  should  be  of  dimen 
sions  n,  m,  n and  m-n  respectively.) 


2. 

Define 

y = I 

- 

T 

zyy  , 

Z = T - 2zz 

3. 

Compute 

r ~ Y 

'o 

c 

* 

b = Ax  + r . 

The  minimal  residual  norm  is  then  ||r>||  = ||e|| . Since  A and  D have  the  same 
singular  values,  the  condition  of  the  problem  is  easily  specified. 

The  particular  problems  used  here  will  be  called 

P(m,n,d,p) 

to  indicate  dependence  on  four  integer  parameters,  where  d represents 
duplication  of  singular  values  and  p is  a power  factor.  The  matrix  D is 
of  the  form  diagCo.^)  with  each  o.  duplicated  d times.  Specific  values 

t I* 

for  x,  y,  z,  f and  D were  generated  as  follows: 

1.  x - (n-1,  n-2,  n-3,  ...,  2,  1,  0)r. 


38 


r 
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2.  y . = sin (4i\i/m),  z.  = cos (4i\i/n),  followed  by  normal  ization  so  that 

Ill/ll  = Ml  = 1 • 

T 

3.  a - (1/m,  -2/m,  3/m,  ...,  +(m-n)  /m > . 

[i  - 1 +d 


4.  o.  = 

i 


d/n  , where  integer  division  is  used  for  the  term  in 


parentheses.  Choosing  n - qd  to  be  a multiple  of  d leads  to  d copies 

of  each  singular  value: 

2 2 n 


, , 1 1 

(a.)  = 

^ [q  q 


q 


’ q 3 


[For  reference,  this  gives:  - 


Ml  = n(n/3r , 


Ml  = lie 


m - n 
m 


((m-  n)  /3)1  , 


|U lip  = ||£>||f  - (n/2r , cond(A)  = condfW  = (a^/a^)p  = qP  .] 


The  orthogonal  matrices  Y and  Z are  intended  to  reduce  the  possibility 
of  anomalous  numerical  behavior.  For  example,  when  LSQR  was  applied  to 
four  cases  of  the  problem  P(10 ,10 ,1,8)  the  following  error  norms  resulted: 


(m 

Case 

-n=10 , condM ) = 10&) 

k-60 

logl0ll*k 

80 

- x|| 

100 

120 

1 

A = YDZ  (as  above) 

-0.3 

-3.3 

-3.  3 

-3.3 

2 

A = YD 

-0.5 

-3.9 

-3.9 

-4. 1 

Q 

A = DZ 

-2.  1 

-5.9 

-5.9 

-9.2 

B 

Q 

II 

-9.4 

-9.4 

-9.4 

-9.4 

Since  each  case  was  a compatible  system  Ax  = b,  we  normally  would  expect 
an  error  norm  approaching  ||x||  •condf/U  *r  =■  10~2 , so  that  case  1 is  the  most 
realistic.  In  case  2 the  error  was  concentrated  in  the  first  and  second 
components  of  (with  the  remaining  components  accurate  almost  to  working 
precision),  whereas  in  cases  3 and  4 the  final  was  virtually  exact  in 
spite  of  the  high  condition  number  of  A. 
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Although  cases  2-4  represent  less  expensive  test  problems,  it  is 
clear  that  results  obtained  from  them  could  be  very  misleading.  In  the 
following  sections  we  use  only  case  1.  liven  these  test  problems  may  be 
less  than  completely  general.  This  is  discussed  in  section  8.4. 

8.2  Ax  = b;  failure  of  CGLS 

Figure  1 illustrates  the  performance  of  five  methods  on  the  ill- 
conditioned  system  P(  10,10,1,8) , i.e.  m = nzz  10 , one  copy  of  each  singular 
value,  condCA ) = 10  . The  quantities  log  J|r.||  and  log ||x,  - x||  are 

I U a 1 U K. 

plotted  against  iteration  number  k. 

This  ex;imple  distinguishes  the  standard  conjugate -gradient  method 
CGLS  from  the  remaining  methods.  All  except  CGLS  reduced  ||r^||  and 
\\xj__  - x^||  to  a satisfactory  level  before  k = 120. 

Also  apparent  is  the  erratic  behavior  of  ||r*.  ||  for  method  CRAIG,  a 
potential  penalty  for  minimizing  \\x^  - x||  at  each  step  without  regard  to 
||r^||.  In  theory  all  other  methods  minimize  ||r^||  at  each  step  and  also 
reduce  ||x^  - .r||  monoton ically. 

If  any  method  is  to  be  preferred  in  this  exajnple  it  would  be  LSQR, 
since  it  reached  limiting  accuracy  at  iteration  76  and  stayed  at 
essentially  the  same  point  thereafter.  With  values  ATOL  = BTOL  = e the 
stopping  rule  SI  as  implemented  in  LSQR  would  have  caused  iteration  at 
k = 76  as  desired. 

8 . 3 Ax  = b\  _f ailure  of  RRLS 

Figure  2 illustrates  the  same  five  methods  applied  to  a larger 
problem  P( 40 ,10 ,4 ,7 ) , in  which  each  singular  value  is  repeated  four  times 
and  cond (A)  - 107 . In  this  case  all  methods  except  RRLS  reduced  ||r^|| 
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satisfactorily  to  about  10  9 for  k z 100.  For  method  RRLS,  ||r^,||  remained 
-5 

of  order  10  for  k ? 30  up  to  k = 250,  and  zero  digits  of  accuracy  were 
obtained  in  x^. 

A similar  disparity  between  RRLS  and  the  remaining  methods  was 
observed  on  the  problems  P(40,40,4,p) , p = 5,0,  cond^/11  = 10 P.  In  fairness, 
Chen  [41  did  not  intend  RRLS  to  be  applied  to  compatible  systems.  However, 
the  success  of  the  other  least-squares  methods  suggests  that  this  is  not  an 
unreasonable  demand. 

8.4  min|l/kc  - b\\  ; high  accuracy  by  RRLS  and  RRLSQR 

Figure  3 shows  the  performance  of  four  least-squares  methods  on  the 
ill-conditioned  problem  P(20,10,l,6) . Since  cond (A)2  = 7012  = 1/c,  we 
would  normally  expect  at  most  one  digit  of  accuracy  in  the  final  x This 
is  achieved  by  LSQR  and  CGLS,  with  LSQR  showing  a smoother  decrease  of 
\\A\ || . 

In  contrast,  the  residual -reducing  methods  achieved  at  least  six 
digits  of  accuracy  in  x^ . Similarly,  three  or  four  digits  of  accuracy  were 
obtained  on  the  problem  P(20,10,l,8) , for  which  cond (A)  = 10  is  so  high 
that  no  digits  could  be  expected.  At  first  sight  it  may  appear  that  the 
residual -reducing  methods  possess  some  advantage  on  least-squares  problems. 
However,  this  anomalous  behavior  cannot  be  guaranteed;  for  example,  it  did 
not  occur  on  P(80 ,40 ,4 ,6) , as  shown  in  Figure  4.  Also,  the  final  value  of 
\\A  r>J|  is  no  smaller  than  for  LSQR  and  this  is  really  the  more  inportant 
quantity. 

Part  of  the  explanation  for  these  occasional  anomalies  may  lie  in  the 
following.  Suppose  the  original  data  (A,b)  have  solution  and  residual 
(x,r),  while  perturbed  data  ( A * 6/1,  b + 6b)  have  (x  + 6x,  r + 6rQ . If 
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A + 6/1  has  full  column  rank  then  it  is  straightforward  to  show  that 

6x  = (A  + 6A)*(6b  - Mx)  + C (4  + 6>1)T(/1  + 6/1)  ;_16/lTr  . 

In  the  present  example  e - 0.7*10  11 , ||/l||2  = 1,  cond (A)  = 106 , ||fc||  =2.4, 
||x||  = 17,  ||r||  = 7.  If  the  perturbations  were  caused  by  rounding  errors  in 
the  initial  data  then  ||6/l||  = e,  ||6i|j  = e,  and  the  first  term  in  the 
expression  for  6x  could  be  about  as  large  as  10~ 4 in  norm,  and  the  second 
could  be  of  order  7.  Figure  3 suggests  the  second  term  is  unusually  small 
for  the  RRLS  and  RRLSQR  computations.  Looking  at  the  particular  form  of 
the  test  problem,  if  we  write 

y = cr1,y2D,  a = yjiz,  r = y2o, 

we  have 

(A  + 6/U  + = A+  = zTz?_1y 1T, 
and  the  second  term  in  6x  is  effectively 

T -2  T 

Z D Z SA  Y a . 

r>Jow  suppose  6/1  is  simply  an  equivalent  perturbation  in  A caused  when  A 
multiplies  a vector  v in  our  test  case.  Using  the  results  of  rounding 
error  analyses  given  by  Wilkinson  [27], 

(A  + 6A)v  = fl(Y  fl(DfUZv)))  = rri  +&Y^)(D+6D)CZ  +SZJv 
where  ||6Y  ||  = c||y  1(|  etc.,  and  so 

64  = 671  ID  + SDJ  (Z  + 6Z)  + Y}  (D6Z  + 6 D(Z  + 6Z)  ) . 

Using  this  6/1  in  the  second  term  for  6x  effectively  gives 
ZTD'Vj  + D~‘lZ6ZTD)(I  + D-16D)6yjry2c 

which  is  bounded  above  by  about  7 x 10~ 6 in  norm,  rather  than  7 as  expected. 
This  gives  a hint  of  what  might  be  happening  above,  since  a more  realistic 
problem  would  not  admit  such  a relation  between  rounding  errors  and 
residual.  This  does  not  invalidate  the  other  numerical  comparisons,  but  it 
does  emphasize  the  care  needed  when  constructing  artificial  test  problems. 


r , 

I; 

42  ;] 

8 . 5 minlj/lje  - fc||  ; normal  behavior 

Figure  4 illustrates  more  typical  performance  of  the  four  methods, 
using  the  least-squares  problem  P(80,40,4,6)  for  which  condf/lJ  = JO6. 

All  methods  reduced  ||/1  rsj|  to  a satisfactory  level,  and  the  final  error 
norm  is  consistent  with  a conventional  sensitivity  analysis  of  the  least- 
squares  problem;  in  this  case  no  more  than  one  significant  digit  can  be 
expected.  Note  that  CGLS  converged  more  slowly  than  the  other  methods. 

It  also  displayed  rather  undesirable  fluctuations  in  ||/lTrj,!|  considerably  ‘ 

beyond  the  point  at  which  the  other  methods  reached  limiting  accuracy. 

8.6  Some  results  using  greater  precision 

Algorithm  LSQR  was  also  applied  to  the  previous  four  test  problems  on 

I 

an  IBM  370  computer  using  double  precision  arithmetic,  for  which 
"*  1 6 

e = 2.2X10  . With  increased  precision,  LSQR  gave  higher  accuracy  and 

» 

also  required  fewer  steps  to  attain  this  accuracy.  This  is  best  seen  by 

referring  to  the  figures.  In  Figure  1 the  log  of  the  residual  reached 

-14.4  at  the  48 th  step  and  stayed  there;  the  log  of  the  error  was  then  -8.6  t 

but  decreased  20  steps  later  to  -9.3  and  stayed  there.  In  Figure  2 the 

logs  of  the  residual  and  error  were  -13.8  and  -8.0  at  step  44  and  differed 

negligibly  from  these  values  thereafter.  In  Figure  3,  log10||/4  r^ll  = -14.6 

and  - *11  = -6-0  at  k = 32  and  thereafter,  while  in  Figure  4, 

m 

logJM  **kll  = -13.9  and  log10||x^  - x||  = -4.6  at  k = 36  with  little  change 
for  larger  k. 


Figure  1.  An  ill-conditioned  system  Ax  * b,  n = 10,  cond (A)  = 108 . 
CGLS  is  unable  to  reduce  ||r^||  or  ||x^  - x||  satisfactorily. 

CRAIG  exhibits  severe  fluctuations  in  ||r^||. 
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9.  TRANSFORMATIONS 

Here  we  discuss  various  ways  of  transforming  the  problem  min||Ax  - fe|| 
with  a view  to  reducing  the  total  work  required  for  its  solution. 

As  in  the  case  of  symmetric  systems  Ax  = fe  (e.g.  Chandra  [3]),  the 
main  aim  behind  scaling,  partitioning,  splitting,  etc.  is  to  improve  the 
distribution  of  the  singular  values  of  A so  that  at  least  one  of  the 
following  occurs: 

(a)  cond (A)  = a /a  . is  reduced,  where  a . is  the  smallest  nonzero 

v ’ max'  min  ’ min 

singular  value; 

(b)  the  number  of  distinct  singular  values  is  reduced. 

Application  of  a conjugate -gradient  algorithm  to  the  transformed  problem 
should  then  result  in  more  rapid  convergence. 


9. 1 Partitioning  by  columns 


It  sometimes  happens  that  the  matrix  A occurs  naturally  in  the  form 
A = [B  C]  where  the  first  partition  B has  special  structure.  For  exairple, 
B may  be  block  diagonal,  or  the  matrix  BTB  may  be  diagonal  or  tridiagonal. 
In  other  cases  cond  (A)  may  be  large  primarily  because  Condi'S!  is  large. 

We  now  show  that  as  long  as  S has  full  column  rank,  we  can  always  transform 


the  problem  min||[S  C] 


- i>||  into  one  of  better  condition  and  reduced 

size.  The  strategy  is  to  compute  z separately  using  a conjugate -gradient 

method  on  a problem  of  the  form 

min 1 1 TCz  - Tb\\  (9.1) 

2 


for  some  matrix  T and  then  obtain  y using  a direct  method  on  the  problem 
min||Sy  - (b  - Cz)  ||  . (9.2) 

y 

The  form  of  T will  depend  on  how  much  is  known  about  S,  but  in  all  cases 
we  shall  have  condfTY?;  < cond(A)  . 
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In  general,  let  the  QR  factorization  of  B be 
B = IY  Z] 


0 


= YR 


where  Q ~ [7  Z]  is  orthogonal  and  R is  upper  triangular.  If  B has  only 
a few  columns  then  it  is  possible  to  construct  Q and  R cheaply  using  a 
product  of  Householder  transformations  or  a product  of  plane  rotations. 

In  other  circumstances  only  Y and  R will  be  known,  while  if  B is  very  large 
perhaps  only  R can  be  stored  economically.  In  the  latter  case,  it  may  be 

m m 

necessary  to  obtain  R from  the  Cholesky  factorization  B B = R R . 

It  can  now  be  verified  that  any  of  the  following  matrices  may  serve  as 
T in  (9.1): 


T1  = Z 
T = Z 7? 


T = I - YY  , 
z 

T — 1 T — 1 — T T 

T - I - B(B  B)  B = T - BR  R B . 

4 


In  the  first  case,  given  that  Q is  orthogonal,  we  have 
min  || Ax  - b\\  = min||y/lx  - £.b|| 


= nun 


R 

YT(~ 

y 

i — 

i 

0 

zTr 

z Tb 

= rran||Z1V’2  - 77b\\  , 

since  we  c.'in  obtain  y from  Ry  + YTCz  = YTb  for  any  z . (This  is  equivalent  to 
to  (9.2).)  Clearly  we  have  cond(/1)  = cond^/1)  > cond(ZTCi. 

The  result  for  T 2 is  proved  similarly  using  the  fact  that  the  matrix  Q 
below  is  orthogonal.  Thus: 


Q = 


Y ZZT 
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~ 0 

~C 

min||/lx  - L||  = min||Q 

x - Q 

A _ 

A 

H 

T 

Y C 

M 

yTb 

0 

T 

zzc 

w 

7.77  b 

T 

Again  it  is  clear  that  condM)  2 cond(ZZ  C).  In  fact  the  operator 
T 

T 2 = Z Z is  of  no  use  in  practice,  hut  it  serves  to  prove  the  result  for  the 
remaining  cases,  using  the  relations  7'2  = A = A and  f = BA’-1. 

T 

On  numerical  grounds  T = Z should  be  used  whenever  possible, 
particularly  if  cond^Bl  is  large.  Also,  this  is  the  only  operator  for 
which  the  transformed  problem  (9.1)  has  a reduced  row  dimension. 


I f T = T or  T = must  be  used,  it  is  important  to  note  that 
2 T 

T = T = T is  a projection  operator.  When  A and  b are  replaced  by  TO  and 

Tb  in  Bidiag  1,  the  vectors  in  (3.1)  satisfy  Tu . = u- . Hence  the  matrix- 

vector  products  required  in  the  conjugate- gradient  algorithms  are  TCv . and 

1 

just  c7u.  (not  CT7V.). 

Note  also  that  the  product  TCv ^ may  be  regarded  as  the  original 
matrix  A operating  on  a vector,  thus 


TCv.  = Cv . 
t 1 


v 


where  z.  solves  the  least-squares  problem  min||fl2 . - Cv.\\.  Each  iteration 
is  therefore  essentially  the  same  as  for  the  original  problem  min||/lx  - fo|| 
with  the  addition  of  a single  subproblem  involving  the  partition  B.  from 
this  point  of  view,  partitioning  is  analogous  to  preconditioning  symmetric 
systems  Ax  = h via  the  splitting  A = M - N,  in  which  the  only  extra  work 


required  is  the  solution  of  a subproblem  Mz-  = r.  (e.g.  Concus,  Golub  and 
O’Leary  IS),  Chandra  131). 


so 


A trivial  example  of  partitioning  arises  whenever  the  parameters  being 
estimated  include  a constant  term  (the  mean).  In  this  case,  A - [e  C ] 

M 

where  e is  a column  ol  / 1 s . Writing  x = 1 , we  would  then  use 

lAl 

7 T r 

T = (I  - ~ee  ) in  (9.1),  solve  for  z and  then  estimate  the  mean  using 
T 

p = e (b  - Cz)/m.  This  is  nothing  more  than  the  well  known  practice  of 
"subtracting  off  the  mean"  (note  that  Tb  - b - pe,  where  w = « b/m)  and  it 
remains  good  practice  in  the  present  context. 

A very  general  development  of  partitioning  in  a recursive  sense  has 
been  given  by  G.N.  Wilkinson  [261.  From  one  point  of  view  this  covers  the 
situation  where  the  first  partition  B can  itself  be  partitioned. 


9 . 2 Pa  rt i t i on i ng  by  rows 

A simpler  situation  occurs  when  the  matrix  A has  the  form 


where  B is  nonsingular  and  is  such  that  systems  of  equations  involving  B 
and  B 1 can  he  solved  efficiently.  An  algorithm  called  PARTCG  was  developed 
by  Chen  1 4 | for  this  situation,  and  he  reports  excellent  results  on  a large 
practical  problem  for  which  B was  triangular. 


It  is  worth  noting  that  algorithm  PARTCG  can  be  derived  by  transforming 
the  normal  equations  (B  B + CTC)x  = \Bl  CT \b  to  obtain  the  equivalent 


system 


(T  + B~TCTrB~' )y  = [I  A-Vjfc 


(where  Bx  = //)  and  then  applying  the  symmetric  conjugate -gradient  algorithm. 
From  this  point  of  view  it  would  appear  preferable  to  retain  the  least- 
squares  formulation  and  solve  the  problem 


in||  y - b 

CB 


U<>  ..  1 » T «.  i.  _«  n . n • r»r\w 
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This  approach  is  a particular  case  of  preconditioning.  It  is 
discussed  by  Bjorck  in  his  comprehensive  survey  of  direct  and  iterative 
methods  for  sparse  least  squares  [1J.  The  first  work  along  these  lines 
appears  to  be  that  of  Lauchli  [141.  In  general  we  may  expect  the  method 
to  be  successful  only  if  b is  wel  1 -conditioned,  or  if  ||Cfl_1||  is  of  moderate 
size. 

9 . 3 Sea  1 ing  and  preconditioning 

The  simplest  form  of  preconditioning  is  column  scaling,  in  which  the 
original  problem  is  replaced  by 

min||/l%  - b||  , x = Dy  (9.3) 

for  some  diagonal  matrix  D.  Normally  it  must  be  left  to  the  user  to 
compute  D from  his  knowledge  of  A and  to  use  it  appropriately  in  his  matrix- 
product  subroutines.  In  the  absence  of  better  information  the  results  of 
van  der  Sluis  T 24 ) indicate  that  each  column  of  AD  should  have  the  same 
Euclidean  length,  say  7.  Thus  the  J-th  diagonal  of  D is  usually 

T % 

<5 . = 7 / (a.  a ■)  , although  this  should  not  be  at.  the  expense  of  exaggerating 

0 0 0 

the  influence  of  inaccurate  data.  A convenient  method  for  excluding  any 
unwanted  columns  of  / is  to  set  the  corresponding  6 . = 0. 

Note  that  the  original  matrix  A often  contains  a high  percentage  of 
unit  coefficients.  In  such  cases  a user  should  treat  A and  D as  separate 
operators  when  implementing  the  matrix- vector  products,  since  over  a series 
of  5 00  or  1000  iterations  the  computation  of  w = Dv,  p = Am  can  be  substan- 
tially cheaper  than  scaling  the  data  explicitly  and  using  p = (AD)v. 

By  analogy  with  (9.3),  a more  general  approach  to  preconditioning  is 
to  employ  some  stable  factorization  of  / , say 


4 


r>2 


in  which  the  left-hand  factor  L has  better  condition  than  A,  and  the  right- 

hand  factor  U is  nonsingular.  We  assume  that  U will  be  a sparse  matrix  and 

T 

that  systems  of  equations  involving  U and  U can  be  solved  efficiently. 
There  are  now  two  equivalent  but  distinct  possibilities: 

min||Ly  - b\\  , solve  Ux  = y ; (9.4a) 

min||AP  - £>||  , solve  Ux  - y . (9.4b) 

The  advantages  of  each  depend  on  the  origin  of  L and  U . In  some  cases  L 
may  have  the  same  sparsity  as  A,  thus  favoring  (9.4a).  A nontrivial 
example  of  this  is  given  in  section  10.2.  Otherwise  (9.4b)  may  be 
preferred  since  it  does  not  require  L to  be  stored. 

A promising  approach  towards  automating  this  form  of  preconditioning 
is  to  use  Gaussian  elimination  with  row  and  column  interchanges  to  produce 
a conventional  sparse  LU  factorization  with  L trapezoidal  and  U triangular: 

P}AP2  = LU  = N ^ . 

For  convenience  we  shall  ignore  the  permutation  matrices  P1  and  P , but  the 
vital  point  is  that  they  can  be  chosen  so  that  L and  U remain  quite  sparse 
and  that  either  L or  U can  be  kept  reasonably  well -conditioned.  The  use  of 
an  LU  factorization  for  (small,  dense)  least-squares  problems  was  first 
suggested  by  Peters  and  Wilkinson  (22)  with  condf/J  being  kept  small  so 
that  the  normal  equations  L Ly  = L b corresponding  to  (9.4a)  could  be 
solved  without  serious  numerical  difficulty.  In  the  present  context  it  is 
again  condf'U  that  must  be  controlled,  but  this  time  the  principal  motive 

* T " . . " ■- 
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is  to  ensure  reasonably  rapid  convergence  when  a conjugate -gradient 
algorithm  is  applied  to  (9.4). 


The  use  of  a sparse  LU  factorization  for  accelerating  the  convergence 
of  CGLS  was  first  suggested  by  Bjorck  in  LU.  We  give  some  promising 
experimental  results  in  section  10.  Another  form  of  automatic  precondition- 
ing has  been  given  by  Bjorck  and  Elfving  [2]  in  their  algorithm  CGPCNE. 

This  uses  the  SSOR  operator  obtained  from  A A and  has  significant  advantages 
in  terms  of  simplicity  and  efficiency.  (Tor  example,  the  sparse  factoriza- 
tion  of  A A is  performed  implicitly  and  is  therefore  not  subject  to  the 
usual  problems  associated  with  fill-in.)  A future  comparison  of  LU-  and 
SSOR- preconditioning  will  be  of  great  interest.  Note  that  both  methods 
require  explicit  access  to  the  rows  or  columns  of  A (thus  A cannot  be 
represented  in  product  form) , but  this  will  not  often  be  a restriction. 


9.4  Linear  constraints 


Equality-constrained  problems  of  the  form 
min||i4#  - £||  subject  to  Cx  = d 


(9.5) 


arise  in  many  applications.  Without  loss  of  generality  we  may  assume  that 


C has  full  row  rank.  The  combined  matrix  | will  usually  have  full  column 
rank. 

If  a statistical  model  happens  to  be  "overspecified",  i.e.  the  matrix 
A is  known  to  contain  column  dependencies,  the  user  will  normally  wish  to 
impose  some  simple  restrictions  which  do  not  increase  the  residual  norm 
but  do  remove  ambiguity  from  the  solution.  (For  example,  a particular  sub- 
set of  the  parameters  x may  be  required  to  sum  to  zero.)  Thus  with  appro- 
priate C and  d it  will  be  known  in  advance  that 

m rz>i 

min  ||  /lac  - b||  = min||  x - ||  , 

C d 


and  the  constraints  will  be  satisfied  at  the  solution  if  they  are  simply 
included  as  additional  rows  of  A and  b.  Hi  is  procedure  is  easy  to 
implement,  but  it  does  increase  the  size  of  the  least -squares  problem  and 
the  approximations  x k will  not  necessarily  satisfy  Cxk  = d throughout  the 
i terat ions . 

More  generally  (whether  A is  singular  or  not),  a factorization  of  the 
constraint  matrix  may  be  used  to  solve  (9.5).  For  example,  let 


T 

QC  = 


> 

_0  _ 


RT y = d 


Q1'  = [y  Z]  , 


a = b - AYy  , 


where  Q is  orthogonal  and  R is  upper  triangular.  The  unconstrained 


problem 


min||/lZ2 


(9.6) 


can  be  solved  by  a conjugate -gradient  method,  and  the  solution  to  the 

■ ■ , T \y 

original  problem  is  then  x = Q . The  operator  Az  in  (9.6)  should  not 
be  formed  explicitly,  since  Q will  normally  be  computed  as  a product  of 


elementary  transformations.  Thus  Z should  be  used  in  the  form  Z 

T1  «/l 

Similarly,  a - b - AQ  . 


This  procedure  reduces  the  dimension  of  the  least -squares  problem  to 
be  solved,  and  any  iterate  zk  will  give  a point  xk  = QT  that  satisfies 
Cxk  - d.  If  the  constraints  are  nearly  dependent,  the  solution  of  RTy  * d 
is  ill-conditioned  and  this  will  normally  be  revealed  by  growth  in  the 
vectors  y and  a.  However,  the  presence  of  constraints  should  not  impair 
the  convergence  of  a conjugate -gradient  method  on  (9.6),  since  with  Q 
orthogonal  it  is  easily  shown  that  cond(/lZ)  * condf/U  regardless  of  cond(fV, 


55 


If  C is  large  and  sparse  it  would  be  more  efficient  to  compute  a 
stable  triangular  factorization  of  the  form 

“T-  [«] 

using  Gaussian  elimination  with  row  and  column  interchanges.  The  above 

procedure  may  then  be  applied  with  L and  11  in  place  of  Q and  R.  In  this 
tI'o'’ 

case  we  have  Z = L | j and  the  bound  on  the  condition  of  the  reduced  problem 
is  weakened  to  condlAZj  < cond (A)  condf 'Zl  <.  condf/i;  condf 'Ll.  This  may  not 
be  of  serious  consequence  since  in  practice  the  interchange  strategy  can 
usually  keep  condfZJ  moderate  while  maintaining  sparsity  in  L and  U. 
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10.  AN  APPLICATION  IN  GEOPHYSICS 

The  Fortran  implementation  of  LSQR  has  been  applied  recently  to  some 
least-squares  problems  arising  from  the  analysis  of  gravity-meter  observa- 
tions (Woodward  [28]).  Hie  formulation  of  the  underlying  model  is  similar 
to  that  described  by  Reilly  [23].  (The  purpose  is  to  estimate  the  mean  ' 

value  of  gravity  at  each  of  a series  of  stations  and  to  determine  a set  of 
instrumental  parameters  for  each  gravity  meter,  including  drift  rate  and  a 
correction  to  the  calibration.) 

We  have  taken  one  such  set  of  observations  to  obtain  a representative 
test  problem 

min||/lx  - b||  , A = AJ)  (10.1) 

1 

in  which  A is  1033  by  434  with  each  row  containing  5 nonzero  coefficients, 
the  first  two  being  unity.  The  diagonal  matrix  D was  chosen  to  normalize 
the  columns  of  Aq  and  to  exclude  some  containing  insufficient  data.  Ignoring 
excluded  columns,  A is  effectively  1033  by  320  and  of  full  rank.  Since 

||b|!  * 0000  and  ||r*||  = ||£  - /lx||  = 0.  75  for  the  solution  x,  the  system  being  l 

solved  is  almost  compatible. 

The  times  quoted  below  are  seconds  of  processor  time  using  unoptimized 

Fortran  code  on  a Burroughs  B6700.  All  condition  number  estimates  are 

those  obtained  by  LSQR.  The  stopping  rule  S2  was  used  with  ATOL  = 10~ 8 

(see  section  6).  Thus  for  a given  operator  A (which  in  some  cases  was  L or 

All  ')  a solution  was  accepted  when  the  estimate  of  ||/lTr*^|| / f ||/1  |j  ||r^||7 
“8 

decreased  below  10 

in  (10.1),  the  rate  of  convergence 
shows  a plot  of  log  ||rfc||  for  the 
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first  600  iterations,  and  the  steady  growth  of  the  estimate  of  cond (A). 

After  1600  iterations  the  run  was  terminated  with  ||r^||  and  ||>1Tr>^||A||>l||  \\r^\\) 
still  as  large  as  0.92  and  JO-4  respectively.  (Thus  ||r,||  had  not  yet  entered 
the  usual  long  flat  "tail".)  It  appeared  that  cond(/U  = JO5  and  that  well 
over  2000  iterations  would  have  been  required  to  reach  a satisfactory 
solution. 

As  an  aside,  this  was  the  longest  run  performed  and  therefore  best 
illustrates  the  accuracy  of  the  norm  estimates  in  section  5.  After  1600 
iterations  the  estimates  of  ||r^,j| , ||/1  r^,||  and  ||x^]|  were  correct  to  eight, 
five,  and  eight  digits  respectively,  a very  satisfactory  result.  The  value 
- ^6  over-estimated  ||/l||^,  = J 320  - 18  by  a noticeable  amount  in  spite 
of  the  bound  (5.8).  This  is  typical  and  in  fact  desirable,  given  the 
manner  in  which  ||a||  is  used  in  the  three  stopping  criteria  of  section  6. 

10.2  Preconditioning  with  LU  factors 

In  order  to  test  the  preconditioning  in  (9.4),  a sparse  LU  factoriza- 
tion of  A was  computed  in  the  form 

(10.2) 

using  Gaussian  elimination  with  row  interchanges.  The  nonzeros  in  U were 
stored  compactly  by  rows,  and  the  rows  of  A were  eliminated  one  by  one  in 
their  natural  order.  Thus  each  L.  represents  a stabilized  elementary 
transformation  of  the  form 

1 
X 

0 

suitably  imbedded  in  the  identity  matrix  of  order  rr  ♦ n,  and  several  L. 

J 


or 


2 V 


(10.5) 


•W 1 


r 

1°. 


5R 


were  required  to  eliminate  each  row  in  turn.  This  gives  /\  = jjj  where  L 
is  stored  in  product  form  as  the  operator 

rr 

L - [0 

i n 


The  multipliers  were  constrained  to  satisfy  | A . | 7/t  for  some  pivot 

tl 

tolerance  ti  (0,1),  thus  allowing  the  usual  compromise  between  sparsity 
and  stability.  Since  L and  U proved  to  be  very  sparse,  x = 0.99  was  chosen 
to  minimize  condi'Ll.  Ihe  number  of  nonzeros  in  A,  L and  U were  about  5600, 
6500  and  1500  respectively,  and  the  factorization  time  of  10  seconds  was 
negl igible. 

.Some  statistics  for  the  three  equivalent  forms  of  problem  (10.1)  are 
summarized  in  Table  1.  It  is  clear  that  LU  preconditioning  leads  to  a 
significant  reduction  in  computation  time  for  this  particular  problem, 
with  only  a moderate  increase  in  storage  requirements. 


Operator 

Condition 

(estimate) 

Total 

iterations 

Time  per 
iteration 

Total 

time 

A 

HO5 

s 2000 

1.0 

>2000 

L 

3500 

390 

2.4 

920 

Al)-y 

3500 

400 

1.  3 

520 

Table  1.  Application  of  LSQR  to  three  equivalent  problems: 
min||/1x  - i>||  , min||Ly  - i>|| , min||4y“\v  - b\\  . 


Note  that  in  an  LIJ  factorization  with  row  interchanges,  condfzj  is  not 

directly  related  to  cond (A) . (For  example,  L itself  is  independent  of  the 

column  scale  of  A.)  In  this  case,  cond (L)  - .16 00  is  rather  higher  than  we 

might  have  wished.  This  is  partly  due  to  the  columns  of  unit  coefficients 

in  A which  generate  many  multipliers  that  are  "worst  possible",  i.e.  |A.|  = 1. 

J 


J 
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Better  numerical  properties  could  be  expected  if  the  elimination  were 
performed  column-wise  rather  than  row  wise  (cf.  Wilkinson  [27,  pp.  162- 
1661).  However,  a more  complex  data  structure  would  then  be  required  for 
efficiency,  and  since  many  of  the  A . would  still  be  of  order  1 the  reduction 
in  cond ( L ) could  prove  to  be  only  slight. 

In  this  context  it  is  nonmil ly  cond (U)  that  reflects  the  condition  of 
A.  There  was  no  growth  in  our  particular  U (largest  off-diagonal  element 
e 0.7)  but  the  largest  and  smallest  diagonals  of  U were  1.0  and  0.00021  res- 
pectively, giving  the  conservative  estimate:  cond(f/j  = 5000. 

Perhaps  the  most  surprising  result  is  that  the  numerical  performance 
of  the  operator  AU  1 was  not  significantly  different  from  that  of  L.  This 
means  that  the  bound  condMP-1)  < cond(/U  condiW  - 108  is  fortunately  not 
operative,  even  though  A and  //-1  are  applied  as  separate  operators.  It  is 
an  open  question  how  far  this  would  remain  true  as  cond(A)  and  cond((/A 
increase,  but  as  long  as  it  does  remain  true  a y-1  is  much  less  expensive 
to  use. 

The  speed  advantage  of  AU  1 over  L is  largely  due  to  the  unit  coeffi- 
cients in  /I  and  to  the  intrinsic  simplicity  of  the  relevant  subroutines. 
This  could  be  expected  in  other  practical  applications. 

Figure  5 compares  the  decrease  of  ||r^,||  for  the  operators  A and  A(~\ 
the  latter  being  indistinguishable  from  L. 

10.3  A p rob  1 em- dependent  t rans format  ion 

Prior  to  implementat ion  of  the  above  preconditioning  the  slow  conver- 
gence of  LSQR  on  (10.1)  necessitated  a reconsideration  of  the  problem 

formulation.  Briefly,  it  was  noted  from  the  structure  of  A that  the 

o 


nonunit  coefficients  could  be  translated  to  give  a more  favorable 
distribution  around  zero  (Woodward  [281).  Algebraically,  the  columns  of  A f 
occurred  in  pairs  of  the  form  [e  al  in  which  the  vectors  e and  a 
possessed  the  same  sparsity  pattern  (with  e containing  unit  coefficients). 
The  Shift  a - a - \ie  was  applied  to  each  such  pair,  where  u = c a/e  e 
was  the  mean  of  the  nonzeros  in  a.  This  amounted  to  a column  transformation 


te  al  = [e  al 


-u 

1 


such  that  e and  a were  orthogonal.  The  usual  column  scaling  then  made  each 
pair  of  columns  orthonormal. 

The  effect  here  is  another  preconditioning  via  the  factorization 
Af  - 1 in  which  T is  unit  upper  triangular  and  Ar  has  the  same  sparsity 
as  A . Convergence  of  LSQR  on  the  problem 

min!|/.r  - b\\  , A = A D (10.4) 

proved  to  be  greatly  accelerated,  as  shown  in  Figure  6.  Compared  to  the 
original  problem  (10.1),  the  reduction  in  cond (A)  was  clearly  important. 
From  the  sharp  drops  in  Hr^H  it  seems  that  another  contributing  factor  was 
an  increased  repetition  of  singular  values.  Unfortunately  a further 
"automatic"  preconditioning  via  A = ~LU  appeared  to  negate  the  latter 
effect,  as  shown  in  Figure  6.  The  smooth  decrease  in  ||r^||  and  the  long 
"tail"  before  stopping  criterion  S2  was  satisfied  indicates  that  L = AU  1 has 
its  singular  values  spread  rather  uniformly  throughout  their  range. 

Some  statistics  for  the  transformed  problem  (10.4)  are  summarized  in 
Table  2.  Again  the  operators  Land  ~Alfy  have  virtually  the  same  numerical 
propert ies. 
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Iteration  number,  k 

Figure  5. 

Comparison  of  min||Ax  - b\\  and  min||i4y-1y  - b|| 
for  problem  (10.1). 

1 = log  Hr*  ||  for  operator  A . 

2 = logrJ|r^||  for  operator  AU  1 . 

3 = estimate  of  cond (A). 

4 = estimate  of  condfAJ/  1). 

The  nunbers  marked  on  2 are  those  tested  against  ATOL  in  stopping 
criterion  S2,  i.e.  estimates  of  ||£Trfc||/(||L||  ||r^||;  where  L = AU~' . 
Results  for  the  operator  L are  essentially  the  same  as  for  . 

Iteration  number,  k 


Figure  6.  Similar  comparison  for  the  transformed  problem  (10.2) 

1 = log  ||r>jJ|  for  operator  A. 

2 = log  ||r^||  for  operators  AU~^ , L. 

3 = estimate  of  condrtD. 

4 = estimate  of  condl'Jw-1)  = condfZ). 


Operator 

Condition 

Total 

iterations 

Time  per 
iteration 

Total 

time 

A 

Z100 

250 

1.0 

250 

Z 

Z200 

Z76 

2.4 

880 

a/t1 

Z100 

Z71 

1.3 

460 

Table  2.  Application  of  LSQR  to  three  equivalent 
forms  of  problem  (10.4). 
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10.4  Comparison  with  QR  factorization 

The  transformed  problem  (10.4)  was  also  solved  directly  using  a sparse 
orthogonal  factorization 


0~ 

~/?'n 

! 

C 

Q 

= 

, Q 

= 

0 _ 

M 

d_ 

where  Q represents  a product  of  plane  rotations  analogous  to  the  row-wise 
elimination  (10.2).  The  factorization  was  reasonably  sparse:  Q = '67>000 
plane  rotations,  /?  = 9000  nonzeros.  It  was  implemented  as  efficiently  as 
possible  subject  to  processing  the  rows  of  A in  their  natural  order.  Some 
ecconomies  could  be  achieved  using  different  row  orderings,  e.g.  Gentleman 
[81,  but  the  factorization  time  of  380  seconds  is  representative  of  the 
great  expense  of  QR-  compared  to  LU- factorization.  Computation  of  the 
standard  error  estimates  (from  R - see  section  5.4)  required  a further  160 
seconds . 

Comparison  with  the  operator  A in  Table  2 shows  that  LSQR  furnished 
an  equally  accurate  x and  rather  less  accurate  standard  errors  in  about 
half  the  time  required  by  the  direct  method.  The  storage  requirements  (and 
incidentally  the  paging  activity  in  the  B6700  virtual  memory  environment) 
were  also  substantially  less.  These  advantages  of  the  conjugate-gradient 
approach  to  least  squares  must  become  more  apparent  with  increasing  problem 
size,  as  long  as  the  relevant  matrix  operator  remains  well  conditioned. 

10.5  A larger  problem 

Finally,  to  indicate  the  effect  of  problem  size,  we  note  some  results 
for  a similar  example  which  contained  approximately  twice  as  many  observa- 
tions (and  hence  twice  as  much  data  defining  A).  Column  means  were 
subtracted  as  before,  giving  problem  (10.4)  with  A 1880  by  8Pf>. 


I 

. 
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In  solving  this  case,  LSQR  obtained  the  estimate  condfjj  ~ MOO , 
essentially  the  same*  as  for  the  smaller  problem.  About  BOO  iterations 
and  WOO  seconds  of  processor  time  were  required,  a four-fold  increase  in 
total  work. 
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11.  SUNMARY 

The  aim  has  been  to  present  the  derivation  of  an  algorithm  and  details 
of  its  implementation,  along  with  sufficient  experimental  evidence  to  suggest 
that  the  algorithm  compares  favorably  with  other  similar  methods  and  that  it 
can  he  relied  upon  to  give  satisfactory  numerical  solutions  to  problems  of 
practical  importance. 

Reliable  stopping  criteria  were  regarded  here  as  being  essential  to  any 
iterative  method  for  solving  the  problems  Ax  - b and  min||/lx  - h||  . The 
criteria  developed  for  LSQR  may  be  useful  for  other  solution  methods. 

Estimates  of  l|/t||  , cond(/U  and  standard  errors  for  x have  also  been  developed 
to  provide  useful  informat  ion  to  the  user  at  minimal  cost. 

Finally,  some  results  obtained  using  a sparse  LU  factorization  of  A 
illustrate  an  effective  method  for  accelerating  convergence  on  ill-conditioned 
problems. 


ACKNOWLEDGEMENTS 

We  wish  to  thank  Dr  Robert  Davies  of  the  DSIR  Applied  Mathematics 
Division  for  his  assistance  and  advice  on  many  aspects  of  this  work.  We 
are  also  grateful  to  Dr  Derek  Woodward  of  DSIR  Geophysics  Division  for 
providing  details  of  his  analyses  of  gravity-meter  observations,  and 
to  the  Computer  Services  Centre  at  the  Victoria  University  of  Wellington 
(and  to  Burroughs  Corporation)  for  providing  the  necessary  confuting 
facilities. 


REFERliNClvS 


66 


I 

1 


1.  BJORCK,  A.  Methods  for  sparse  linear  least  squares  problems.  In 
BUNCH,  J.R.  and  ROSFi,  D.J.  (lids.),  Sparse  Matrix  Computations, 
Academic  Press,  New  York,  1976,  177-199. 

2.  BJORCK , \.  AND  ELFVING,  T.  Accelerated  projection  methods  for 
computing  pseudoinverse  solutions  of  systems  of  linear  equations. 
Research  Report  LiTH-MAT-R- 1978-5,  Department  of  Mathematics , 
Linkop ing  University,  .Sweden,  1978. 

3.  CHANDRA,  R.  Conjugate  gradient  methods  for  partial  differential 
equations.  Research  Report  129,  Department  of  Computer  Science, 
Yale  University,  1978. 


4.  CHEN,  Y.T.  Iterative  methods  for  linear  least-squares  problems. 
Research  Report  CS-75-04,  Department  of  Computer  Science,  University 
of  Waterloo,  1975. 

5.  CONCUS,  P. , GOLUB,  G.H.  AND  O'LEARY,  D.P.  A generalized  conjugate 
gradient  method  for  the  numerical  solution  of  elliptic  partial 
differential  equations.  In  BUNCH,  J.R.  and  ROSE,  D.J.  (Eds.), 

Sparse  Matrix  Computations,  Academic  Press,  New  York,  1976,  309-332. 

6.  ELFV1NG,  T.  On  the  conjugate  gradient  method  for  solving  linear 
least-squares  problems.  Research  Report  LiTH-MAT-R- 1978-3,  Department 
of  Mathematics,  Linkoping  University,  Sweden,  1978. 

7.  FADDEKV , D . k . AMD  EADDEEVA,  V.N.  Computational  Methods  of  1. inear 
Algebra.  Freeman  and  Co.,  London,  1963. 


8.  GENTLEMAN,  M.W.  Regression  problems  and  the  QR  decomposition. 

Bulletin  of  the  Institute  of  Mathematics  and  Its  Applications  10  (1974), 
195-197. 


67 


j 


♦ 

I 


I 

* 


I 


9.  GOLUB,  G.H.  Numerical  methods  for  solving  linear  least-squares 
problems.  Numerische  Mathematik  7 (1965),  206-216. 

10.  GOLUB,  G.H.  ANL)  KA11AN,  W.  Calculating  the  singular  values  ;ind  pseudo- 
inverse of  a matrix.  SIAM  J.  Numer.  Anal.  2 f 1 965 ) , 205-224. 

11.  HESTENES,  M.R.  AND  STIEFEL,  E.  Methods  of  conjugate  gradients  for 
solving  linear  systems.  J.  Res.  Nat.  Bur.  Standards  49  f 1952) , 

409-456. 

12.  HOUSEHOLDER,  A.S.  Terminating  and  non- terminating  iterations  for 
solving  linear  systems.  SIAM  .J.  Appl . Math.  5 (1955),  67-72. 

13.  LANCZOS,  C.  An  iteration  method  for  the  solution  of  the  eigenvalue 
problem  of  linear  differential  and  integral  operators.  J.  Res.  Nat. 

Bur.  Standards  45  (1950),  255-282. 

14.  LAUCHLI,  P.  Jordan-elimination  und  Ausgleichung  nach  kleinsten 
Quadraten.  Numerische  Mathematik  3 (1961),  226-240. 

15.  LEWIS,  J.G.  Algorithms  for  sparse  matrix  eigenvalue  problems. 

Research  Report  STAN-CS-77- 595 , Stanford  University,  1977. 

16.  NASHED,  M. Z.  Aspects  of  generalized  inverses  in  analysis  and 
regularization.  In  NASHED,  M.Z.  (Ed.),  Generalized  Inverses  and 
Appl ications.  Academic  Press,  New  York,  1976,  193-244. 

17.  NELDER,  J.A.  GLIM  Manual . Numerical  Algorithms  Group,  13  Banbury 
Road,  Oxford,  1975. 

18.  PAIGE,  C.C.  Bidiagonal izat ion  of  matrices  and  solution  of  linear 
equations.  SIAM  .J.  Numer.  Anal.  11  (1974),  197-209. 

19.  PAIGE,  C.C.  Error  analysis  of  the  Lanczos  algorithm  for  tridiagona- 
lizing  a symnotric  matrix.  .J.  Inst.  Maths.  Applies.  18  (1976),  341-349. 


I 


20.  PA I a;,  C.C.  AND  SAUNDERS,  M.A.  Solution  of  sparse  indefinite  systems 

of  equations  and  1 east  - squares  problems.  Research  Report  STAN-CS-73-399, 

Stanford  University,  1973. 

21.  PAIGE,  C.C.  AND  SAUNDERS,  M.A.  Solution  of  sparse  indefinite  systems 
of  linear  equations.  SIAM  J.  Numer.  Anal.  12  (1975),  bl7-629. 

22.  PETERS,  G.  AND  WILKINSON,  J.H.  The  least-squares  problem  and  pseudo 
inverses.  Computer  J.  1.3  (1970),  309-316. 

23.  REILLY,  W.I.  Adjustment  of  gravity  meter  observations.  New  Zealand 
J.  of  Geology  and  Geophysics  13  (1970),  697-702. 

24.  VAN  DER  SLUIS,  A.  Condition  numbers  and  equilibration  of  matrices. 

Numerische  Mathematik  14  (1969),  14-23.  J 

25.  STEWART,  G.W.  Research,  development  and  LINPACK.  In  RICE,  J.R.  (Ed.), 

Mathematical  Software  III,  Academic  Press,  New  York,  1977,  1-14. 

26.  WILKINSON,  G.N.  A general  recursive  procedure  for  analysis  of  variance, 
fiiometrika  57  (1970),  19-46. 

27.  WILKINSON,  J.H.  The  Algebraic  Eigenvalue  Problem.  Oxford  University 
Press  (Clarendon),  London  and  New  York,  1965. 

28.  WOODWARD,  D.J.  Private  communication.  Geophysics  Division,  DSIR, 

Wellington,  1978. 


APPENDIX  A:  FORTRAN  PROGRAMS 


69 

Listings  are  given  here  of  the  following  Fortran  routines: 

1.  LSQR  - the  principal  subroutine  inplementing  the  sparse 

least-squares  algorithm. 

2.  Auxiliary  routines  required  by  LSQR. 

NORMLZ  - normalize  a vector  to  unit  length. 

SAXPY  - compute  fix  + y for  scalar  a and  vectors  x,  y. 

SNRM2  - compute  the  euclidean  norm  of  a vector. 

ATIMES  - a sample  routine  for  computing  the  product  p = Av. 

m 

ATRANS  - a sample  routine  for  computing  the  product  p = A u. 

3.  Routines  for  generating  a test  problem. 

HPROI)  - apply  a Householder  transformation  to  a vector  according 
to  p = (I  - 2zzT)v,  where  ||i:||  = 7. 

LSTP  - generate  a least-squares  test  problem  P(<nyn,d,p)  of  the 
form  described  in  section  8.1  of  this  paper. 

TEST  - request  generation  of  a specific  test  problem  and  then 
solve  it  using  L.SQR. 

4.  A sample  main  program  calling  TEST. 

These  routines  are  written  in  the  PFORT  subset  of  ANSI  Fortran  (Ryder 
and  Hall  (21).  There  are  no  machine -dependent  constants.  Conversion  between 
single  and  double  precision  is  accomplished  by  interchanging  the  following 
cha  rac  te  r s t h roughout : 


ABS 

< > 

DABS 

REAL 

< > 

REAL* 8 or  DOUBLE  PRECISION 

SQRT 

< > 

DSQRT 

The  routines  SAXPY  and  SNRM2  correspond  to  two  members  of  the  BIAS  collection 
(Lawson,  Hanson,  et  al  [1]).  For  improved  efficiency  the  BIAS  versions  should 
be  used  when  available.  The  appropriate  double  nrecision  routines  would  then 
be  DAXPY  and  DNRM2. 

[1]  LAWSON , C.L. , HANSON,  R.J. , KINCAID,  D.R.  AND  KR0G1,  F.T.  Basic  Linear 
Algebra  Subprograms.  Research  Report  SAND77-0898,  Sandia  Laboratories, 

Albuquerque,  New  Mexico,  1977. 

[2]  RYDER,  B.G.  AND  HALL,  A.D.  The  PFORT  Verifier.  Computing  Science 
Technical  Report  12,  Bell  Telephone  Laboratories,  Murray  Hill,  New 
Jersey,  1973  (revised  1975). 
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SUBROUTINE 

1 

2 


LSQR  ( M,N,MAX,P,P,U,V,W,X,SE, 

ATOL,  BTOL,CONLTM  , I TNM  M , NOIJT  , 
TSTOp, RNORM , ANORM , ACOND  ) 


INTEGER 

REAL 

1 


M,N,MAX,  I'T’NLIN  ,N0l)T,  TSTOP 

B(M)  , p ( MAX ) ,U(M)  ,V( N)  , w ( N ) , X(N)  , SE(N)  , 

ATOL, BTOL.CONLIM , RNORM , A NORM , AOOMn 


LSQR  FINDS  A SOLUTION  X TO  ^HF  FOLLOWING  PROBLFMS . . . 
(UNSYMMETRIC  EQUATIONS)  SOLVE  A.X  = B, 

(LINEAR  LEAST  SQUARES)  MTNTMT7F  NORM ( R ) WHERE  R = B - A.X, 

TN  WHICH  A IS  A REAL  MATRIX  OF  DIMENSIONS  M BY  N AND  B IS  A 
G I VEN  M- VECTOR . NORMALLY  RANK (A)  SHOULD  BE  N,  WTTH  M >=  N, 

RUT  THIS  IS  NOT  ESSENTIAL  (E.G.  SOME  OF  THE  ROWS  AND  COLUMNS  OF 
A MAY  RF  ZERO) . 

THE  MATRIX  A TS  INTENDFD  TO  BE  LARGE  AND  SPARSE.  IT  IS  ACCESSED 
BY  MEANS  OF  TWO  SUBROUTINE  CALLS  OF  THE  FORM 


C 


CALL  ATTMES ( V,P,M,N  ) 
CALL  ATRANS ( U,P,M,N  ) 


C WHICH  MUST  RETURN  THE  PRODUCTS  P = A.V  AND  P = A (TRANSPOSE)  .11 

C FOP  GTVFN  TNPUT  VECTORS  V AND  U. 

r 

C SUBROUTINES  ATTMES  AND  ATRANS  ARE  TO  BE  SUPPLIED  BY  THE  USER . 

C THEY  MUST  NOT  ALTER  V OR  U RESPECTIVELY.  THE  LENGTH  OF  THE 

C OUTPUT  VECTOR  P TS  RESPECTIVELY  M AMD  N,  PU^  TN  EITHER  CASE 

C THE  ARRAY  PARAMETER  P MAY  BF  DECLARED  TO  HAVE  LENGTH  MAX(M,N) . 

C (THIS  MAY  BE  USEFUL  TF  P IS  USED  FOR  WORKSPACE  BEFORE  EXTT . ) 


C NOTE.  THE  NUMBER  OF  ITERATIONS  REQUIRED  BY  LSOP  DEPENDS 

C CRITICALLY  ON  THE  CONDITION  NUMBER  OF  THE  MATRIX  A.  POOR 

C SCALTNC  OF  THE  COLUMNS  OF  A SHOULD  BE  AVOIDED.  THIS  CAN 

C USUALLY  BE  TAKEN  CARE  OF  WHEN  PROGRAMMING  SUBROUTINES  ATIMES 

C AND  ATRANS.  IN  THE  ABSENCE  OF  BETTER  INFORMATION  THE  NONZEPO 

C COLUMNS  OF  A SHOULD  BE  SCALED  SO  THAT  THEY  ALL  HAVE  THE  SAME 

C EUCLIDEAN  NORM  (USUALLY  1.0). 

C 

C 

C PARAMETERS 

C 


r 

C 

r 

M 

INPUT 

THE 

NUMBER 

OF  ROWS  IN  A. 

C 

C 

C 

C 

N 

INPUT 

THE 

NUMBER 

OF  COLUMNS  TN  A. 

MAX 

TNPUT 

MAX 

OF 

(M,N)  . 

P TN 

USED  ONLY  TO  SPECTFY  THE  LENGTH 
THE  DIMENSTON  STATEMENT  ABOVE. 

I 


L 
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C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


B(M) 

P ( MAX ) 

u (M) 
V(N) 
W(N) 

X (N) 

SE  ( N) 


ATOL 


BTOL 


INPUT 

OUTPUT 

OUTPUT 

WORKSPACE 

WORKSPACE 

OUTPUT 

OUTPUT 


THE  RHS  VECTOR  OF  LENGTH  *1 . 

RETURNS  THE  VECTOR  A (TRANSPOSE) . R , WHFRE 
R = B - A . X TS  THE  RESIDUAL  VECTOR  CORRES- 
PONDING TO  THE  COMPUTED  SOLUTION  X. 

RETURNS  THE  RESIDUAL  VECTOR  P = B - A.X. 


RETURNS  THE  COMPUTED  SOLUTION  X. 

RETURNS  STANDARD  ERROR  ESTIMATES  FOP  THE 
COMPONENTS  OF  X. 


NOTE.  THE  FOLLOWING  DISCUSSION  OF  TOLERANCES  TS 
IN  TERMS  OF  A OUANTITY  EPS  WHTCH  STANDS  FOR  THE 
RELATIVE  PRECISION  OF  FLOATING-POINT  ARITHMETIC 
ON  THE  MACHINE  BEING  USED.  TYPICAL  VALUES  ARF 


c 

BURROUGHS  B6700 

SINGLE 

PRFCTSTON, 

FPS 

= 

1 .PIE-11 

c 

CDC 

6600 , 7600 

SINGLE 

PRECISION, 

EPS 

= 

1 . 0E-1 4 

c 

IBM 

360 , 370 

STNGLF. 

PRECISION , 

EPS 

= 

1 . 0F-6 

c 

DOUBLE 

PRECISION, 

EPS 

= 

1 . 0D-1 6 

c 

UNIVAC  1100  SERIES 

SINGLE 

PRECISION , 

EPS 

= 

1 . 0E-R 

c 

DOUBLE 

PRECISION, 

EPS 

= 

1 .0D-1 8 

(ALL  VALUES  APPROXIMATE) 


INPUT 


INPUT 


CONLIM  INPUT 


AN  ESTIMATE  OF  THE  RELATIVE  ACCURACY  OF 
THE  DATA  DEFINING  THE  MATRIX  A. 

ATOL  WILL  NORMALLY  BE. IN  THE  RANGE 
EPS  TO  SORT (EPS) . 

SUGGESTED  VALUE  — ATOL  = 10O0*EPS 

AN  ESTIMATE  OF  THE  RELATTVF  ACCURACY  OF 
THE  DATA  DEFINING  THE  RHS  VECTOR  B. 

BTOL  WILL  NORMALLY  BE  TN  THE  RANGE 
EPS  TO  SORT (EPS) . 

SUGGESTED  VALUE  — BTOL  = 1 ^ PI  PJ  * EPS 

AN  UPPER  LIMIT  ON  COND(A) , THE  APPARENT 
CONDITION  NUMBER  OF  THF  MATRIX  A (IGNORING 
KNOWN  SINGULARITIES) . ITERATIONS  WTLL 
TERMINATE  IF  A COMPUTED  ESTIMATE  OF  COND(A) 
EXCEEDS  CONLIM.  THIS  TS  INTENDED  TO  PREVENT 
CERTAIN  SMALL  OR  ZERO  SINGULAR  VALUES  OF  A 
FROM  COMING  INTO  EFFECT  AND  CAUSING  UNWANTED 
GROWTH  TN  THE  COMPUTED  X.  HFNCF  TT  MAY  ASSTCT 
IN  REGULARIZING  T LL-CONDTT TONED  SYSTEMS. 
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! 


> 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

Q 

r 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

Q 

c 

c 

r 

r 

C 

C 

r 

C 

r 

r 

C 

C 

C 

r 

C 

C 

C 

C 

r • 

V* 

c 

c 

c 

c 

c 

c 

c 

c 


CONLIM  W T LI,  NORMALLY  B P TN  THR  RANGR 
Id  51  PI  TO  1/RPS. 

SUGGESTED  VALUE  — 

CONLTM  = 1 / (1  000*EPS)  FOR  COMPATIBLE  SYSTEMS  , 
CONLIM  = 1/(1 0 *SORT ( EPS)  ) FOR  LEAST  SQUARES. 

NOTE.  TF  THE  USER  IS  NOT  CONCERNED  ABOUT  THE  PARAMETERS 
ATOL,  BTOL  AND  CONLTM,  ANY  OR  ALL  OF  THEM  MAY  "FI  SET 
TO  7 ERO . 

ITNLIM  INPUT  AN  UPPER  LTMTT  ON  THE  NUMBER  OF  ITERATIONS. 

SUGGESTED  VALUE  — 

ITNLIM  = N/2  FOP  WELL-CONDTT TONED  SYSTEMS, 

ITNL  T M = 4 *N  OTHERWISE. 


NOUT  INPUT  LINEPRTNTEP  UNIT  NUMBER.  IF  POSITIVE, 

A SUMMARY  WILL  RE  PRINTED  ON  UNIT  NOUT. 


I STOP  OUTPUT 
0 


1 


2 


I 


4 

5 


7 


RNORM  OUTPUT 


AN  INTEGER  C-TVING  THF  REASON  FOR  TERMINATION... 

X = 0 IS  THE  EXACT  SOLUTION. 

NO  ITERATIONS  WERE  PERFORMED. 

THE  EQUATIONS  A . X = B ARE  PROBABLY 
COMPATIBLE.  NOPM(R)  TS  SUFFICIENTLY  SMALL 
GIVEN  THE  VALUES  OP  ATOL  AND  BTOL. 

THE  SYSTEM  A . X = B IS  PROBABLY  NOT 
compatible.  A least-squares  solution  HAS 

BEEN  OBTAINED,  ^OR  WHTCH  NORM ( A . R1  TS 
SUFFICIENTLY  SMALL  GTVEN  THE  VALUE  OF  ATOL. 

THE  SYSTEM  IS  MORE  T LL-CONDTTIONED  THAN 
EXPECTED.  AN  ESTIMATE  OP  COND ( A ) map 
EXCEEDED  CONLIM. 

THE  ITERATION  LTMIT  ITNLTM  WAS  REACHED. 

THE  EQUATIONS  A . X = B APF  PROBABLY 
COMPATIBLE.  NORM ( R1  TS  AS  SMALL  AS  SEEMS 

REASONABLE  ON  THTS  MACHINE. 

THE  SYSTEM  A . X = B TS  PROBABLY  NOT 
COMPATIBLE.  A LEAST-SQUARES  SOLUTION  HAS 
BEEN  OBTAINED,  FOR  WHTCH  NORM  I A . R ) TS  AS 
SMALL  AS  SEEMS  REASONABLE  ON  THTS  MACHINE. 

COND ( A ) SEEMS  TO  BE  SO  LARGE  THAT  THERE  TS 
NOT  MUCH  POIN'R  TN  DOING  FURTHFR  ITERATIONS, 
GTVEN  THE  PRECISION  OF  THTS  MACHINE. 

NORM ( R)  = SQRT(R.R),  THE  NORM  OF  THE  FINAL 
RESIDUAL  VECTOR  R = B - A.X. 


I 
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C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


ANORM  OUTPUT  AN  ESTIMATE  OF  THE  FRORENTUS  NORM  OF  a. 

THIS  IS  THE  SOUARE-ROOT  OF  THE  SUM  OF  SOUARFS 
OF  THE  ELEMENTS  OF  A.  TF  THE  COLUMNS  OF  A 
HAVE  ALL  BEEN  SCALED  TO  HAVE  LFNOTH  1.0, 
ANORM  SHOULD  TNCRFASE  TO  ROUGHLY  SOPTfNl . 

A RADICALLY  DIFFERENT  VALUF  FOR  ANORM  MAY 
INDICATE  AN  ERROR  TN  ONE  OF  THE  SUBROUTINES 
ATIMES  OR  ATRANS. 

ACOND  OUTPUT  AN  ESTIMATE  OF  COND(A) , THE  CONDITION 

NUMBER  OF  A.  A VERY  HTGH  VALUE  OF  ACOND 
MAY  AGATN  INDICATE  AN  ERROR  IN  SUBROUTINES 
ATIMES  OR  ATRANS. 


TO  CHANGE  PRECISION 


IF  SUBSTITUTE  BLAS  ROUTINES  ARE  IN  USE,  ALTER  THE  WORDS 
ABS,  REAL  , SORT 

THROUGHOUT  ROUTINES  LSQR,  NORMLZ , SAXPY,  SNRM2. 

IF  AUTHENTIC  BLAS  ROUTINES  ARE  TN  USE,  ALTER  THE  WORDc 
ABS,  REAL  , SAXPY,  SNRM2 , SORT 
THROUGHOUT  ROUTINES  LSOR,  NORM LZ . 


REFERENCE  A B I DI AGONA LI Z ATION  ALGORITHM  FOR  SPARSE 

LINEAR  EOUATTONS  AND  LEAST-SOU A RES  PROBLEMS, 

TECHNICAL  REPORT  SOL  78-19,  DEPARTMENT  OF 
OPERATIONS  RESEARCH,  STANFORD  UNTVFRSTTY,  1978 


AUTHORS  C.C.  PAIGE 

MCGILL  UNIVERSITY,  CANADA 


M.A.  SAUNDERS 
DSTR,  NEW  ZEALAND 


LSQR.  THIS  VERSION  DATED  2 OCTOBER  1978. 


SUBROUTINES  AND  FUNCTIONS 


USER 

LSOR 

BLAS 

FORTRAN 


ATIMES, ATRANS 
NORM  LZ 
SAXPY 

ABS, MOD, SORT 


I 


i 


1 


mmmmmammmrnimm* 


■~i  o n n nno 
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FUNCTIONS  AND  LOCAI,  VARIABLES 


INTEGER 

REAL 

1 

2 

3 

4 


I , TTN  , MOP  , NCOMV,  NSTOP 

ABS  , ALFA  , ARNORM  , BBNORM  , BETA  , BNOPM  , 

CS , CS2  , CTOL  , DDNORM  , DELTA  , sa mva , GB A R , 

ONE , PH I , RBAR,RHO,RHS,RTOL, 
STNES,SN,SN2,SOR'f',T,TEST1  , TFST2  , TEST"* , 
THFTA ,T1 ,T2 ,T3 , XNORM , XXNORM , 7 , ZBAR , ZERO 


INTTT ALTZF 


Z F RO  = 0 
ONE  = 1 

IF  (NOUT  .GT.  0)  WRTTF(NOUTf  1 PI  PI  PI  > M , N , ATOL  , BTOL  , CONLTM  , TTNLTM 
CALL  NORMLZ ( B , U , M , BETA  ) 

CALL  ATRANB(  U,P,M,N  ) 

CALL  NORMLZf  P,V,N,ALFA  ) 


2 PI 


C 


C 


DO  2 PI  I = 1 , N 
W(I)  = V(I) 
X ( T ) = ZERO 
SE(T)  = ZERO 
CONTINUE 


SINES 

= 

ONE 

BBNORM 

= 

ZERO 

DDNORM 

= 

ZERO 

XXNORM 

= 

ZERO 

Z 

= 

ZERO 

CS  2 

= 

-ONE 

SN2 

= 

ZERO 

CTOL 

= 

ZERO 

IF  (CONLTM  .G' 

ITN 

= 

0 

TSTOP 

= 

Pi 

NSTOP 

= 

0 

ANORM 

= 

ZERO 

ACOND 

= 

ZERO 

RBAR 

= 

ALFA 

BNORM 

= 

BETA 

RNOPM 

= 

BETA 

XNORM 

= 

ZERO 

ARNORM 

= 

ALFA' 

ZERO)  CTOL  = ONE/CONLTM 


rF  (NOUT 
[F  (NOUT 
[F  (NOUT 


.GT. 
• GT. 
.GT. 


0)  WRITE (NOUT, 
0)  WRITE (NOUT, 
0)  WRITE (NOUT, 


[F  (ARNORM  . LE.  ZERO)  GO  TO 


1 200) 
1 400) 
1500) 
700 


ITN , X ( 1 ) , RNOPM , ARNORM 


o o n o o 
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C 

C 

C MAIN  ITERATION  LOOP 

C 

r* 

C PERFORM  NEXT  STEP  OF  THE  B I DI  AGON  A I,T  2 AT  TON 

C TO  OBTAIN  NEW  BETA,  U,  ALFA,  V 

C 

100  ITN  = TTN+1 

CALL  ATI MES ( V,P,M,N  ) 

CALL  SAXPY ( M, (-ALFA) ,U,1 ,P,1  ) 

CALL  NORMLZ ( P,U,M,BETA  ) 

C 

BBNORM  = BBNORM  + ALFA*  * 2 + BETA*  *2 
C 

CALL  ATRANSI  U,P,M,N  ) 

CALL  SAXPY ( N, (-BETA) ,V, 1 ,P, 1 ) 

CALL  NORMLZ ( P,V,N,ALFA  ) 


COMPUTE  NEXT  PLANE  ROTATION  FOR  THE  OR-  FACTOR  T 7,  AT  TON 
OF  THE  LOWER  BIDIAGONAL  MATRIX  B (T.F.  O.B  = R) 


RHO 

CS 

SN 

THETA 

RBAR 

PHI 

RNORM 

ARNORM 

SINES 


SORT ( RBAR*  * 2 + BETA*  *2) 
RBAP/RHO 
BETA/ RHO 
SN* ALFA 
- C S * A L F A 
CS* RNORM 
SN* RNORM 
''BS(RBAR)  * RNORM 
SN*S I NFS 


UPDATE  X AND  THE  STANDARD  ERROR  ESTIMATES 

T 1 = PUT /RHO 
T2  = -THETA /RHO 
T3  = ONE/RHO 
CALL  SAXPY ( N,T1 ,W,1 ,X,1  ) 

DO  350  I * 1 , N 
T * W(  I) 

W(I ) = T*T2  + V(  I) 

T = (T*T1) **2 

SE(I)  = T + S E ( I ) 

DDNORM=  T f DDNORM 
350  CONTINUE 


non  nnooonn  on on  noon 
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ESTIMATE  NORM  ( X)  USING  AN  LO-FACTOR  I 7,  ATION 
OF  THE  UPPER  BTDTAGONAL  MATRIX  R (T.E.  P.0  = L) 


DELTA 

GBAR 

RHS 

ZBAR 

XNORM 

GAMMA 

CS2 

SN2 

Z 

XXNORM 


SN2*  RHO 
-CS2*  RHO 
PHI  - DELTA*  Z 
RHS/GBAR 

SQRT(XXNORM  + ZBAR*  * 2 ) 

SORT (GBAR*  * 2 + THETA* *2) 

GBAR/GAMMA 

THFTA/GAMMA 

RHS/GAMMA 

XXNORM  + 7**2 


TEST  FOR  CONVERGENCE 

ANORM  = SORT ( BBNOFM) 

ACOND  = ANORM*  SORT ( DDNOPM ) 

PTOL  = RTOL  + ATOL*ANORM*xVORM/RNORM 
TEST!  = SINES 

TEST2  = ARNORM/ ( ANORM* RNORM) 

TEST?  = ONF/ACOND 

THE  FOLLOWING  THREE  TESTS  ARE  INDEPENDENT  OF  THE  TOLERANCES 
ATOL,  BTOL  AND  CONLIM.  THEY  ARE  INTENDED  TO  GUARD  AGAINST 
ACCIDENTAL  SETTING  OF  THOSE  PARAMETERS  TO  EXTREME  VALUES. 
THEY  APE  EQUIVALENT  TO  THE  NORMAL  TESTS  USING  THE  VALUES 
ATOL  = EPS,  BTOL  = EPS,  CONLIM  = 1/EPS. 


T1  = ONE  + TEST1 
T2  = ONE  + TEST? 
T3  = ONE  + TEST? 


IF 

(T3 

. LE. 

ONE) 

T STOP  = 

7 

IF 

(T2 

. LE. 

ONE) 

ISTOP  = 

IF 

(T 1 

. LE. 

ONE) 

ISTOP  = 

S 

ALLOW  FOP  TOLERANCES  SET  BY  USER 


IF 

( ITN 

.GE. 

ITNLIM) 

ISTOP  = 

4 

IF 

( TEST  3 

. LE. 

CTOL 

) 

ISTOP  = 

3 

TF 

(TEST? 

. LE. 

ATOL 

) 

ISTOP  = 

2 

IF 

( TEST1 

.LE. 

RTOL 

) 

ISTOP  = 

1 

. - 


onnn  nonooooo  non  noon 
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SEE  IF  IT  IS  TIME  TO  PRINT  SOMETHING 


IF  (NOUT  . LE.  0)  GO  TO  600 


IF 

(M.LE.40 

.OR.  N . LE .40) 

GO 

TO 

400 

IF 

(ITN  .LE 

. 10) 

GO 

TO 

400 

IF 

(ITN  .GE 

. ITNLIM-1 0) 

GO 

TO 

400 

IF 

(MOD (ITN 

,10)  .EQ.  0) 

GO 

TO 

400 

IF 

( TEST3  . 

LE.  2 . 0*CTOL) 

GO 

TO 

400 

IF 

( TEST2  . 

LE.  1 0 . 0 * ATOL) 

GO 

TO 

400 

IF 

(TEST1  . 

LE.  1 0 . 0*  RTOL) 

GO 

TO 

400 

GO  TO  600 

PRINT  A LTNE  FOR  THIS  ITERATION 
400  CONTINUE 

WRI TE ( NOUT , 1400)  ITN , X ( 1 ) , RNORM , ARNORM , TEST! , TEST2 , ANORM , AC^ND 
IF  ( MOD ( ITN ,10)  .EQ.  0)  WRITE(NOUT,  1S00) 


STOP  IF  POSSIBLE. 

THE  CONVERGENCE  CRITERIA  ARE  REQUIRED  TO  BE  MET  ON  NCONV 
CONSECUTIVE  ITERATIONS,  WHERE  NCONV  IS  SET  BELOW. 
SUGGESTED  VALUE  — NCONV  =1,2  OR  1 

600  IF  (ISTOP  .EQ.  0)  NSTOP  = 0 
IF  (rSTOP  .EQ.  0)  GO  TO  100 
NCONV  = 1 
NSTOP  = NSTOP+1 

IF  (NSTOP  .LT.  NCONV  .AND.  ITN  .LT.  TTNLTM)  ISTOP  = 0 
IF  (ISTOP  .EQ.  0)  GO  TO  100 


END  OF  ITERATION  LOOP 


non  n o non  n n 
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PRINT  SUMMARY  OF  FINAL  SOLUTION  STATUS 
7PI 0 CONTINUE 


IF 

(NOUT  . LE. 

0) 

GO  TO  800 

WRITE (NOUT,  1900) 

ITN, ISTOP 

IF 

(I STOP  . EO. 

0) 

WRITE (NOUT, 

7000) 

IF 

(ISTOP  .EO. 

1) 

WRITE (NOUT, 

2100) 

IF 

(ISTOP  .EO. 

2) 

WRITE (NOUT, 

2200) 

IF 

(ISTOP  . EO. 

1) 

WRITE (NOUT, 

2300) 

IF 

(ISTOP  . EO. 

4) 

WRITE  (NOU'1' , 

2400) 

IF 

(ISTOP  .EO. 

5) 

WRITE (NOUT, 

2500) 

IF 

(ISTOP  .EO. 

6) 

WRITE (NOUT, 

26  00) 

IF 

(ISTOP  .EO. 

7) 

WRITE (NOUT, 

2700) 

COMPUTE  FINAL  RESIDUAL  R = B - A.X,  AND  A ( TRANSPOSE) .R 

800  T1  = RNORM 
T2  = ARNORM 
T3  = XNORM 

CALL  ATT MES ( X,P,M,N  ) 

DO  900  I = 1,  M 

U(I)  = B ( I ) - P ( T ) 

900  CONTINUE 

CALL  NORMLZ ( U,P,M, RNORM  ) 

CALL  ATRANS ( U,P,M,N  ) 

CALL  NORMLZ ( P,W,N, ARNORM  ) 

CALL  NORMLZ ( X,W,Nf XNORM  ) 

IF  (NOUT  .GT.  0)  WRITE  (NOUT  , "3000)  BNORM  , ANORM  , ACOND , TI  ,T2,T3, 
1 RNORM, ARNORM, XNORM 

FINISH  OFF  THE  STANDARD  ERROR  ESTIMATES 

T = ONE 

IF  (M  .GT.  N)  T = M-N 
T = RNORM/SQRT (T) 

DO  950  T = 1,  N 

SE(T)  = T*SQRT(SE(T) ) 

950  CONTINUE 
C 

RETURN 
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C 

C 

10(10  FORMAT  ( 

1 //  25X,  46HLSQR  — LEAST-SQUARES  SOLUTION  OF  A.X  = B 

2 / 25X , 46 H ( I . E . MINIMIZE  NORM ( R)  , WHERE  R = B - A.X  1 

3 //  25X,  29HTHE  MATRIX  A HAS  DIMENSIONS,  T6 , 5H  BY,  16 

4 //  25X , 8HATOL  =,  1PE10.2,  10X,  «H8TOL  =,  1PE10.2 

5 / 25X , 8HCONLTM  =,  1PE10.2,  10X,  8HITNLTM  =,  lid) 

1200  FORMAT ( / / 3X , 3HITN , 9X,  4HXM),  14X,  7HNORM ( R) , RX,  9HNORM(A.R), 

1 3X,  23HCOMPATIBLE  INCOMPATIBLE,  4X,  7HNORM ( A ) , 4X,  7HC0ND(A)  /) 

1400  FORMAT (16,  1PE20.10,  1PE19.10,  1PE12.3,  1P2E1I.3,  1P2E11.2) 


1500  FORMAT ( IX) 

1900  FORMAT( 

1 / 19H  NO.  OF  ITERATIONS  , 110 

2 //  19H  STOPPING  CONDITION,  110) 

2000  FORMAT ( 1H+ , 34X,  44H (THE  EXACT  SOLUTION  IS  X = 0)  ) 

2100  FORMAT  ( 1 H+ , 34X,  44H(NORM(R)  TS  SMALL  ENOUGH,  GIVEN  ATOL,  BT<">L ) ) 

2200  FORMAT ( 1 H+ , 3AX,  44H ( NORM ( A . R)  IS  SMALL  ENOUGH,  GIVEN  ATOL)  ) 

2300  FORMAT ( 1H+ , 34X,  44H(COND(A)  HAS  EXCEEDED  CONLIM)  1 

2400  FORMAT ( 1 H+ , 34X,  44H ( ITERATION  LIMIT  REACHED)  ) 

2500  FORMAT ( 1 H+ , 34X,  44H(NORM(R)  TS  SMALL  ENOUGH  FOR  THIS  MACHINE)  1 

2600  FORMAT ( 1H+ , 34X,  44H (NORM ( A . R ) IS  SMALL  ENOUGH  FOR  THIS  MACHINE)  ) 

2700  FORMAT ( 1 H+ , 34X,  44H(COND(A)  IS  TOO  LARGE  FOR  THIS  MACHINE)  ) 


3000  FORMAT ( 


1 

/ 

19H 

NORM ( B) 

TRUE  , 

1PE20.10 

2 

5X, 

19H 

NORM (A) 

ESTIMATE, 

1PE15.5 , 

3 

5X , 

19H 

COND(A) 

ESTIMATE, 

1 PE 15.5 

4 

// 

19H 

NORM (R) 

ESTIMATE, 

1PE20 .10 

5 

5X, 

19H 

NORM ( A . R) 

ESTIMATE, 

1PE1S.5, 

6 

5X , 

19H 

NORM ( X ) 

ESTIMATE, 

1PE1 5 . 7 

7 

/ 

19H 

TRUE, 

1PE20.10 

8 

5X, 

19H 

TRUE, 

1 PEI  5 . 5 , 

9 

5X, 

19H 

TRUE, 

1 PEI  5.7) 

C END  OF  LSQR 

END 


noon  n non  o n n on 
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SUBROUT  T NR  N0RML7.  ( P,V,n,BETA  ) 

INTEGER  N 

REAL  P(N) , V ( N ) , BETA 

C 

C NORN  LZ  TS  REQUIRED  BY  SUBROUT  TNF  I^OP. 

C IT  NORMALIZES  THE  VECTOR  p AND  RETURNS  THF  REStILT  TN  V. 

C ON  RETURN,  BETA  = NORM(P)  AND  RFTA*V  = P. 

C 

C FUNCTIONS 

C 

C BLAS  SNRM2 

C 

INTEGFR  I 

REAL  ONE, SNRM2,T, ZERO 


ZERO  = 0 
ONE  = 1 

BETA  = SNRM 2 ( N,P,1  ) 

IF  (BETA  . LE.  ZERO)  RETURN 
T = ONE/BETA 


DO  20  I = 1,  N 
V(I)  = P(T)*T 
20  CONTINUE 
RETURN 

END  OF  NORMLZ 
END 


SUBROUTINE  SAXPY(  N , A , X , TNCX , Y , INC Y ) 

REAL  A , X ( N ) , Y ( N ) 

THIS  MAY  BE  REPLACED  BY  THE  CORRESPONDING  BLAS  ROUTINE. 
THE  FOLLOWING  TS  A SIMPLE  VERSION  FOR  USE  WITH  LSQR. 

DO  1 0 I = 1 , N 

Y ( I ) = A*  X ( I ) + Y ( I ) 

10  CONTINUE 
RETURN 
FND 


REAL  FUNCTION  SNRM2(  N,X,TNCX  ) 

REAL  X ( N ) 

THIS  MAY  BE  REPLACED  BY  THP  CORRESPONDING  BLAS  ROUTINE. 
THE  FOLLOWING  IS  A SIMPLE  VERSION  FOR  USE  WITH  LSQR. 

SNRM2  = 0 
DO  10  I = 1,  N 

SNRM2  = X ( I ) * * ? + SNPM2 
10  CONTINUE 

SNRM2  = SORT ( SNRM  2 ) 

RETURN 

END 


fJi*' 


n o n n n n non  non  n nn  nnn 
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SUBROUTINE 
I NT EGER 
REAL 

COMPUTE  P 


ATIMF.S  ( V,P,M,N  ) 

M ,N 

V (N ) ,P(M) 

= A . V FOR  TEST  MATRIX 


A 


INTEGER 

REAL 

REAL 

COMMON 


I ,Nl 
Z FRO 

D , R , XTRUE , Y , Z 

/LSCOMM/  D(100)  , R ( 1 0 0 ) , XTRUE  ( l 00 ) ,Y(100)  , Z I 1 0 0 > 


CALL  HPROD ( V,P,Z,N  ) 

DO  100  1=1, N 

P ( I ) = D ( I ) *P ( I ) 

100  CONTINUE 

TF  (M  .LE.  N)  GO  TO  500 
ZERO  = 0 
N1  = N+l 
DO  200  I =N 1 , M 
P ( I ) = ZERO 
200  CONTINUE 

J 

500  CALL  HPROD ( P,P,Y,M  ) 

RETURN 

END  OF  ATIMES 
END 


SUBROUTINE 

INTEGFR 

REAL 


AT R A NS ( U,P,M,N  ) 
M,N 

U (M)  , P (N) 


COMPUTE  P = A(TRANSPOSE) .U  FOR  TEST  MATRTX  A 


INTEGER  I 


REAL  D,R, XTRUE, Y,Z 

COMMON  /LSCOMM/  D ( 1 0 0 ) , R ( 1 00 ) , XTPUF (1 00 ) , Y (1 00 ) 


7(100) 


CALL  HPROD ( U,P,Y,M  ) 

DO  100  I = 1 , N 

P ( I ) = D ( I ) *P ( T ) 
100  CONTINUE 

CALL  HPROD ( P,P,7,N  ) 
RETURN 

END  OF  ATRANS 
FND 
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SUBPOUTINF  HP  ROD  ( 7 , P , Z , N ) 

INTEGER  N 

RE  A L PIN)  ,V(N)  ,7(N) 

r 

C APPLY  HOUSEHOLDER  TRANSFORMA tt  N ^ rPT  P = (T  - 7 7 7 ' ) . V 

C 

INTEGER  t 

REAL  S 

C 

S = 0 

DO  100  I = l,N 

S = Z (I) *V(T)  + S 
100  CONTINUE 

C 

s = s + s 

DO  200  1 = 1 , N 

P ( T ) = V(I)  - S*Z(I) 

200  CONTINUE 

RETURN 

END  OF  HPROD 
END 


SUBROUTINE  LSTP(  M , N , NDUPLC , NPOWEP , B , ACONO , RNORM  ) 
INTEGER  M,N, NDUPLC, NPOWER 

REAL  B(M) , ACOND , RNORM 

GENERATE  A SPARSE  LEAST-SQUARES  TEST  PROBLEM,  A.X  = B, 
WHERE  A = Y.D.Z  IS  M BY  N,  D IS  DIAGONAL,  AND 
Y AND  Z APE  HOUSEHOLDER  TRANSFORMATIONS. 

FUNCTIONS  AND  SUBROUTINES 


TESTPROB  ATIMES, HPROD 
LSQR  NORMLZ 

FORTRAN  COS , FLOAT, SIN 


INTEGER 

REAL 

REAL 

COMMON 


I , J , MN 

ALFA, BETA, COS, FLOAT, FOURPT , SIN, T 
D , R , XTRUE , Y , Z 

/LSCOMM/  D ( 1 00 ) , R ( 1 0 0 ) , XTRUE (100)  ,Y(100)  ,Z(100) 


MAKE  T’WO  VECTORS  OF  NORM  1.0,  FOR  HOUSEHOLDER  TRANSFORMATIONS 


FOURPT  = 4.0*1.141592 
ALFA  = FOURPT /FLOAT ( Ml 

BETA  = FOURPT/FLOAT (N) 

DO  100  I = 1 , M 

Y(T>  = S IN ( FLOAT  IT) * ALFA) 
100  CONTINUE 

DO  200  1=1, N 

Z ( T > = COS ( FLOAT ( I ) *BFTA) 
200  CONTINUE 


. 
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C 

CALL  NORMLZ ( Y,Y,M,ALFA  ) 
CALL  NORMLZ  ( Z,Z,N,BF.TA  ) 
C 


SET  SINGULAR  VALUES  FOR  DIAGONAL  MATRIX  D 


300 

C 

C 

C 

C 

C 


400 

C 

C 

C 

C 


c 

c 

c 

c 


500 

c 


600 

C 

c 


700 

C 


C 

C 


DO  300  1 = 1 , N 

J = ( I-l+NDUPLC) /NDUPLC 
T = J *NDUPLC 
T = T/FLOAT (N) 

D ( I ) = T*  *NPOWER 
CONTINUE 

ACOND  = D(N)/D(1) 


SET  SOLUTION 

XTRUE 

DO  400  1=1, N 
XTRUE(I)  = 
CONTINUE 

N-I 

COMPUTE  RHS 

B 

CALL  ATI MES ( 
IF  (M  .LE.  N) 

XTRUE, B,M,N  ) 
RETURN 

FOR  LEAST  SQUARES,  ADD  RESIDUAL 

DO  500  1=1, N 
R ( I ) = 0.0 
CONTINUE 


T = 1 
MN  = M-N 
DO  600  1=1, MN 
J = N+I 

R ( J)  = T*  FLOAT ( I ) /FLOAT ( M) 
T = -T 
CONTINUE 

CALL  HPROD ( R , R , Y , M ) 

DO  700  1=1, M 

B(I)  = B ( I ) + R ( I ) 

CONTINUE 

CALL  NORMLZ ( R , R , M , RNORM  ) 
RETURN 

END  OF  LSTP 
END 


R 
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SUBROUTINE  TEST ( M , N , NDUPLC , NPOWFR  ) 

INTEGER  M,N, NDUPLC, NPOWFR 

EXAMPLE  DRIVER  ROUTINE  FOR  TESTING  LSOR 

INTEGER  I STOP , I TNLTM , J , NOUT 

REAL  B(100)  , P ( 10  0 ) ,U(100)  ,V(100)  , 

1 W(100)  ,X(100)  , SE ( I 00)  , 

2 ATOL , BTOL , CONLI M , RNORM , »NORM , ACOND 


GENERATE  SPECIFIED  TEST  PROBLEM 

CALL  LSTP ( M, N, NDUPLC, NPOWER,B, ACOND, RNORM  ) 

NOUT  = 6 

WRITE ( NOUT , 1000)  NDUPLC, NPOWER, ACOND, RNORM 

SET  TOLERANCES  FOR  LSQP 

ATOL  = 1.0E-10 

BTOL  = ATOL 

CONLIM  = 1 . 0E+10 

IF  (M  .GT.  N)  CONLIM  = 1.0E+5 

TTNLIM  = 100 

CALL  LSQR ( M , N , M , B , P , U , V , W , X , SE , 

1 ATOL , BTOL , CONLIM , ITNLTM , NOUT , I STOP , RNORM , ANORM , ACOND  ) 

OUTPUT  RESULTS 

WRITE ( NOUT , 2000) 

WRJTEfNOUT,  4000)  ( J , X ( .7 ) , J=1,N) 

WRITE ( NOUT , 3000) 

WRITE ( NOUT , 4000)  (J , SE ( J) , J=1 ,N) 

RETURN 

1000  FORMAT ( 1 HI 

1 / 28H  LEAST-SQUARES  TEST  PROBLEM. 

2 //  25H.  SINGULAR  VALUES  REPEATED,  13,  8H  TTMFS . , BX, 

3 15H  POWER  FACTOR  =,  13 

4 //  1 1 H COND(A)  =,  1 PE 12.4,  8X, 

5 11H  NORM ( R ) =,  1PE20.10) 

2000  FORMAT!///  PH  SOLUTION) 

3000  FORMAT!/  16H  STANDARD  ERRORS) 

4000  FORMAT  (5(17,  1.PE17.0)) 

END  OF  TEST 
END 


EXAMPLE  MAIN  PROGRAM 

CALL  TEST!  10,10,1,6  ) 
CALL  TEST!  80,40,4,2  ) 
STOP 
END 
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APPENDIX  B:  OUTPUT  PROM  LSQR 


The  following  listings  illustrate  the  solution  of  two  test  problems 
using  the  Fortran  routines  in  Appendix  A.  The  problems  are: 

P( 10 ,10 ,1 ,6 ) - an  ill-conditioned  compatible  system  Ax  = b, 

P( 80, 40, 4, 2)  - a reasonably  well -conditioned  least-squares  problem, 

min||/lx  - i>  ||  2 , 

where  problem  P(m,n,d,p)  is  defined  in  section  8.1.  The  machine  used  was  a 
Burroughs  B6700.  The  particular  tolerances  input  to  LSQR  (namely  ATOL  = 
BTOL  = 1.0E-10)  requested  slightly  less  than  maximum  attainable  precision 
for  this  machine. 


The  quantities  output  by  subroutine  LSQR  each  iteration  are  as  follows. 
They  are  expressed  in  terms  of  the  current  approximate  solution  vector 
and  the  corresponding  residual  vector  = b - Ax 


ITN 


The  iteration  number.  For  larger  problems  a line  is  printed 
every  tenth  iteration. 


X(l) 

NORM(R) 

NORM  (A.  R) 
COMPATIBLE 


The  value  of  the  first  component  of  x ^ . 

The  value  of  ||r^,|| . This  converges  to  zero  if  Ax  = b is 
compatible;  otherwise  to  a positive  limit. 

T 

The  value  of  ||/1  r^|| . This  converges  to  zero  in  all  cases. 

A dimensionless  quantity  which  should  converge  to  zero 
if  and  only  if  Ax  = b is  compatible.  It  is  a product  of 
sines  s^... which  estimates  1 1 1 1 /||fc||. 


INCOMPATIBLE  A dimensionless  quantity  which  should  converge  to  zero 
if  and  only  if  the  optimum  residual  r = b - Ax  is  nonzero. 
It  is  an  estimate  of  |MTrfe||  / f|l4||F||rfe||;. 

NORM(A)  A monotonically  increasing  estimate  of  ||a||  . 

F 

COND(A)  A monotonically  increasing  estimate  of  condMl  = |M|f||^+||f,. 
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